Figure 1. Windows Git Bash Terminal
Figure 2. R Studio
Figure 3. Github Homepage
Creating a directory for my portfolio in my documents folder
First of all, create the folder using the following command in Git Bash:
mkdir MICB425_portfolio
Then create a repository on GitHub titled MICB425_portfolio but do not initialize it with README, gitignore, or license files
Initializing the portfolio repo
Return to Git Bash and change the directory to MICB425_portfolio
Then use the following command to initialize it:
git init
Pushing files to GitHub
First, use the following command to add all items within the directory to the index:
git add .
Then, commit the files to the repo using the following command, with the text in quotations used to annotate the commit:
git commit -m “blah blah blah”
Next, use the URL from the GitHub GUI Webpage where the repo was started in the following command:
*git remote add origin https://remote_repository_URL*
Finally, push the repo to GitHub:
git push -u origin master
Adding files to my GitHub repository
To add files, use the add, commit, and push commands as follows:
git add . Note: Replace the period with the file name to only add the one file and not the whole directory
git commit -m “Blah Blah Blah”
git push
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools weâve shown you in class. Hopefully by the end of this, you wonât feel at all the way this poor PhD student does. Weâre here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. Itâs a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you canât find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Letâs be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps youâre already really confused by the whole markdown thing. Maybe youâre so confused that youâve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after youâve added those numbers, you feel like itâs about time for a table!
Iâm going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). Itâs not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now youâve almost finished your first RMarkdown! Feeling excited? We are! In fact, weâre so excited that maybe we need a big finale eh? Hereâs ours! Include a fun gif of your choice!
#Load libraries
library(tidyverse)
## -- Attaching packages ------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.7.2 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ---------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(phyloseq)
#Load Data
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
load("phyloseq_object.RData")
#Convert Phyloseq data to percent
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
#SiO2 vs Depth, with purple triangles
ggplot(metadata, aes(x=SiO2_uM, y=Depth_m)) +
geom_point(color="purple", shape=17)
#Temperature_F vs Depth_m
metadata %>%
mutate(Temperature_F = Temperature_C*9/5+32) %>% #Convert temperature to F and pipe it to ggplot
ggplot(aes(x=Temperature_F, y=Depth_m)) +
geom_point()
plot_bar(physeq_percent, fill="Class") +
geom_bar(aes(fill=Class), stat="identity") +
labs(x="Sample Depth", y="Percent relative abundance")
metadata %>%
gather(Nutrient, uM, NH4_uM, NO2_uM, NO3_uM, O2_uM, PO4_uM, SiO2_uM) %>% #Put all the desired data into a single column
ggplot(aes(x=Depth_m, y=uM))+
geom_point() +
geom_line() + #Do both point and lines for the graph
facet_wrap(~Nutrient, scales="free_y") +
theme(legend.position="none")
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
What were the primary methodological approaches used?
Estimates of cell density for various environments extrapolated out to the global-biotic number of cells. They used literature to find the values to use for each of the environments.
Summarize the main results or findings.
The total carbon taken up by prokaryotes is 60-100% of the total carbon taken up by plants and prokaryotes contain 10-fold more nutrients than plants. Also, the highest cellular productivity is found in the open ocean, therefore mutations and other rare genetic events are more likely to appear in marine populations than others. Finally, the subsurface-turnover time is far longer than found in other ecosystems.
Were there any specific challenges or advantages in understanding the paper (e.g.* did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?*
The assumptions regarding population density were well described, and the extrapolations to the whole global-biome populations seem justified, but there were some calculations that did not make sense. Specifically, the carbon content, efficiency, and turnover time calculations seemed to use values that did not have supporting evidence. Also, there were so many assumptions that it made it hard to truly follow what was really true or simply a hypothetical. The tables were laid out really well though, with all values displayed and not just the total values for each biome.
Comment on the emergence of microbial life and the evolution of Earth systems
Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.
Hadean
4.6 GA: Solar system formed, inner planets received water vapor and carbon 4.5 GA: Moon formed and gave Earth spin and tilt, day/night cycle, and seasons
4.5 GA - 4.1 GA: High levels of CO2 increased temperature during times of the weak early sun.
4.4 GA: Zircon formation: oldest mineral
4.4 GA - 4.1 GA: meteorite impacts
4.1 GA: Evidence of life in zircon and from carbon isotopes
4 GA: Oldest rock: Acasta gneiss and evidence of plate subduction
Archaean
3.8 GA: Existence of life: from sedimentary rocks and methanogenesis
3.5 GA: Microfossils and stromatolites present
3.5 GA - 2.7GA: Cyanobacteria photosynthesize
2.7 GA: Great oxidation event: responsible for glaciation
Proterozoic
2.5 GA - 1.5 GA: red rock beds observed: evidence of oxidation
1.7 GA: Eukaryotes appear
1.1 GA: Snowball Earth occurs
Phanerozoic
540 MA: Cambrian explosion: increased diversity of life and larger organisms
Land plants observed
250 MA: Permian extinction: 95% species extinct
Gigantism of organisms
65 MA: Cretaceous/Paleogene Extinction
Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints: Hadean
There was a lot of CO2 to keep the Earth warm, as the sun was weak back then. Earth was mostly molten rock and very hot
Archean
Atmosphere was filled with CH4 to keep the Earth warm still. As photosynthesis evolved, some O2 was present
Proterozoic
O2 reacted with atmospheric methane to produce CO2 This caused a net decrease in greenhouse gas effects, making earth cold and leading to glaciation. Oxygen on Earth started oxidizing iron into banded iron formations, seen in sedimentary rock.
Phanerozoic
Increased oxygenation of the atmosphere Plants started to evolve and can be seen on Earth. Coal deposits developed as organisms died in extinctions and were stored in sediments There was the occasional glaciation period
How has humanity affected antrhopogenic markers of functional changes in the Earth system, and how is the Anthropocene distinguished from the Holocene?
They reviewed various lines of evidence to track stratigraphic signatures over time. Some of the markers they investigated were standard epoch markers like ice core NO3, Temperatures, CO2 and methane, but they also used novel markers specific to human activity like Plutonium deposition, and concrete and plastic production. They also looked at vertebrate extinction rates.
The stratigraphic signatures they discusses are either completely novel with respect to the Holocene or they are quantitatively outside the range of variation of the Holocene, plus they are accelerating. The exact boundary of the Anthropocene still needs to be figured out but should use their evidence to assist in the decision.
Is it helpful to formalize the Anthropocene, or is it better to leave it as an informal, albeit solidly founded, geological time term, as the Precambrian and Tertiary currently are.
There was no methods section, so how the data was being collected was a challenge to follow. It looks like they used references for all of the data, but there are so many references that it makes it hard to validate their findings.
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
Aquatic - The majority of prokaryotic life in aquatic environments is found in the open ocean. They have a short turnover time and therefore a high cellular productivity, which means that mutations and other rare genetic events are most likely to occur here than other habitats.
Total abundance: 1.180x10^29
Soil - Major reservoir of organic carbon; prokaryotes are essential in soil decomposition
Total abundance: 2.556 x 10^29
Subsurface - Major habitat for prokaryotes, with most of the subsurface biomass supported by organic matter deposited from the surface.
Total abundance: 3.8 x 10^30
Upper 200m of the ocean: 3.6x10^28 Density: 5x10^5 cells/mL
Fraction represented by cyanobacterium including Prochlorococcus:
4x10^4 cells/ml/5x10^5 cells x 100 = 8%
Marine cyanobacterium such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the prokaryotic cell abundance in the upper 200m, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time and resulting 8.2 x 10^29 cells/year
Autotrophs: in this text are bacteria that produce their own food, primarily using energy from the sun. As a result, these are prokaryotes that are often found on surface environments that are able to receive some amount of sunlight. They are <10% of upper layer marine prokaryotes. However, they form the majority of prokaryotes in soil and subsurface? Thus, they are defined as primarily land-dwelling organisms. âSelf-nourishingâ, fix inorganic carbon (CO2) â Biomass
Heterotrophs: make up the majority of prokaryotic organisms with the majority, assimilate organic carbon. Of those found below 200m, they are defined as the most abundant sea-dwelling organisms.
Lithotrophs: are subsurface prokaryotes that use a different method of energy generation. They are defined as mysterious, primarily found in subsurface environments, and are scarcer than other types of prokaryotes. Use inorganic substrates.
The Mariana Trench is the deepest part of the ocean, and we know that it is an environment that supports prokaryotic life, although at this depth, there is nearly no light reaching it as well. Therefore, it is the deepest habitat known to support life. Because the paper has deduced that subsurface sediments below the water layer also contains prokaryotes, we could make the argument that the deepest habitat to host prokaryotic life would be the subsurface sediment layer of the Trench. Subsurface environments on land may contain prokaryotes further below that of the Mariana Trench. However, not much is currently known about life existing below these depths, due to challenges in retrieving uncontaminated samples from these areas. The text talks about how in subsurface environments, the limited carbon nutrition available to these organisms means that the majority are metabolically inactive or non-viable. However, evidence shows that metabolic activity is on par with that of surface prokaryotes. Because most of the carbon nutrient availability is gained from the surface, the primary limiting factor would be the transfer of carbon nutrients from surface to deeper subsurface environments, which logically decreases the deeper you go.
Change in temperature as getting deeper is about 22 ËC/km
Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
Prokaryotes have been found in in the atmosphere at altitudes as high as 57-77 km. Mount Everest (8,848 meters, 8.8 km) is the highest geographical location on Earth, and therefore would technically be the highest habitat capable of supporting prokaryotic life. Is it capable of supporting prokaryotic life? Primary limiting factors at this height include temperature. Some prokaryotes, psychrophiles, have adapted to such low temperatures. Nutrients are also limited at high altitude. Less atoms are found in the upper atmosphere and thus less material is available to compose the building blocks of life. This would result in slower growth. UV radiation as well as pressure are limiting to life at high altitudes because they can damage cells.
Lower range:
* Mariana Trench is 10,994 meter deep, but the lower limit is much deeper since it includes subsurface sediments, which is about 4.5km deeper.
Upper limit: * Mount Everest 8,848 m high, but the upper limit is much higher if it includes atmosphere as an âhabitatâ.
Vertical distance of the Earthâs biosphere: 19.84 km + 4.5km = 24km (+ potential atmosphere)
Annual cellular production, in cells/year X 10^29 was calculated with the following formula: Cells/year = Population Size * (365 / (turnover time [days]))
Carbon content along with carbon assimilation efficiency determine the upperbound limit on the turnover rates seen in the upper 200m of the ocean. This varies with depth in the ocean, and between terrestrial and marine habitats because the abundance of carbon in each habitat is different.
Carbon efficiency: 20% 20 fg of C on avg in prokaryotic cell (20 fg/cell) ~20 = 20*10^-30 Pg/cell (3.6 X 10^28 cells) x (10^-30 Pg/cell) = 0.72 Pg C in marine heterotrophs
51 Pg cell/year 85% consumed = 43 Pg C
(43 Pg cell/year)/2.88 Pg/year = 14.9 turnovers/year, 1 turnover every 24.1 days
[365 days /14.9 turnovers = ~24 days / turnover]
((365d/y)(24h/d)/(((410-7)4 mutations/cell))(8.210^29 cells/y)=(h/4 simultaneous mutations)
= 4x10^-7 mutations/generation
For 4 mutations to happen at once: (4x10-7)4 = 2.56x10^-26 mutations/generation (3.1x 10^28 cells) x 22.8 = 8.2 X 10^29 cells/yr 365 / 16 = 22.8 turnover/yr (8.2 x 10^ 29 cells/ yr) x 2.56 x 10^26 mutations/yr = 2.1 X 10^4 mutations/ yr
A large mutation rate means that there is a great potential for multiple point mutations in a single replication. This allows for quick adaptation by creating a more diverse pool of mutants to be selected from. Genetic diversity will be extremely high when small scale changes to sequence are considered and long term “species” level biodiversity will mostly be determined by competition and environmental pressures. Horizontal gene transfer can allow new genes to proliferate in a microbial community assuming the gene is successful in the organism is “born” in.
High abundance allows for high diversity by increasing the potential for mutations and simultaneous mutations. Metabolic potential is dependent on both abundance and diversity. Diversity determines the pool of available genes to be used in metabolic pathways and abundance determines the magnitude of the effect of these pathways.
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
Geophysical - Geothermal processes, diagenesis, tectonics, erosion, mountain building, atmosphere
Biogeochemical - Microbial catalyzed redox reactions (photosynthesis, fermentation, etc.)
The abiotic processes recycle matter from the earth to be used by biotic processes, which then use the matter to create energy and pass the products of their metabolism on to other biotic processes to use. The final end products, like CO2 and CH4, are released into the atmosphere to be recycled by the abiotic processes.
The feedback between microbial evolution and biogeochemical processes has created Earth’s current redox state. Every process is linked and interconnected.
Cycles can be oxidative or reductive, yet they all feed the end products into new reactions, creating cycles. Strategies: - methanogenic Archaea reduce CO2 with H2, but require high enough H+ tension for the reaction to proceed forward, otherwise it will proceed in reverse. Certain species of methanogens utilize the unfavourable reverse reaction through the synergistic cooperation with H2 consuming sulfate reducers.
The only biological process that makes N2 available for biomolecule synthesis is nitrogen fixation, a reductive process that transforms N2 to NH4+. This process is catalyzed by nitrogenase and inhibited by O2. O2 present - NH4+ oxidized to nitrate in two-stage pathway requiring one group of microbes to oxidize ammonia to NO2-, which is subsequently oxidized to NO3- by a different niche of microbes. O2 absent - a different microbial group uses NO2- and NO3- as e- acceptors in the anaerobic oxidation of organic matter, ultimately forming N2.
There is a link between the nitrogen cycle and climate change, for without the available nitrogen provided by the nitrogen cycle life would not have been able to evolve the pathways required for oxidative phosphorylation, which has driven major geological climate changes throughout earth’s history.
While horizontal gene flow has distributed many families of genes, there still exists metabolic diversity amongst microbial communities. For example, chlorophyll-based photosynthesis is restricted to bacteria while methanogenesis is restricted to archaea. The discovery of new protein families is increasing linearly with the number of new genomes sequenced.
Due to the glacial periods of earth’s history, life was restricted to tiny pockets of microbiall-habital patches, where the metabolic pathways remained protected. Since the metabolic mechanisms have been horizontally transfered to every species on earth, individual taxonomic units can go extinct while the core metabolic machines carry on unperturbed.
Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides." Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.
Microbial life has played a fundamental role in shaping Earth’s biosphere; however, I believe that humanity is more than capable of surviving without the global catalysis and environmental transformation that microbes provide. I will not argue that humanity would be able to adapt and survive following a fantastical, cataclysmic event where only microbial life was extinguished. Rather, I will argue that humanity has the capacity to understand all the biogeochemical processes provided by microbial life with the help of continually-improving computational models. Humanity also has the capacity to understand the influence our own technology has on the global environment, and we can consciously adapt our global actions as necessary to restore order to the global equilibrium. Finally, we can develop new technology to replicate and optimize every biogeochemical process on a global and potentially interplanetary scale. Through these processes, humanity will be capable of controlling and guiding the global environment to ensure the continued survival of our species without relying on evolutionarily-limited, globally-unconscious microbial life.
Before we can control the Earth, we must understand it. By learning about the history of Earth, we have been able to discover how biogeochemical cycles have driven global changes in Earth’s environment, which could help us model how it might occur again in the future [1]. It is thought that the changes in Earth’s biosphere that are currently underway are being driven by human activity [2]. This new era has been named the Anthropocene, and if we are to understand the changes underway and attempt to mitigate the damage to the environment and human society we will need to use computational models to simulate global geochemical cycles. Models of this scale are not possible with our current technology, but there has been a continual improvement in the computational power and accuracy of our models for decades. For example, Falkowski et al. noted that, “.we have considerable information about specific aspects of the carbon cycle, but many of the couplings and feedbacks are poorly understood” [3] because we needed a systems/integrative approach, yet not even 20 years later Wang et al. modeled global soil carbon and soil microbial carbon by combining multiple models into one integrative model [4].
Another example of the improvement in our computational models is from meteorological simulations that model atmospheric conditions to predict the weather. Forty years ago, our 3- and 7-day forecast skill were at 80% and 40%, respectively, and as of 2013 those values are at 95% and 70%, respectively [5]. These improvements came from exponential increases in computational power, but there were other major advancements such as the global satellite data gained in the early 2000s [5]. Today, we are at the dawn of using machine learning and artificial intelligence in our predictive models, which recursively improve upon themselves to achieve an accuracy greater than any conventional model can [6]. The atmosphere is an important part of the global biosphere, and the advances in all of our computational modelling technologies will eventually lead to an all-encompassing, global model of the Earth’s biogeochemical cycles.
Humanity’s technological advancement has influenced the global environment at an unprecedented scale, and we need to be able to recognize the influence our own technology has and adjust our actions as a global community to restore order to the Earth’s equilibrium. A prime example of humanity recognizing the influence of our technology on the environment is the crisis of the ozone hole and the signing of the Montreal Protocol. In 1973, it was discovered by the future Nobel laureates Paul Crutzen, Mario Molina, and F. Sherwood Rowland that chlorofluorocarbons (CFCs), a class of compounds commonly used as refrigerants and flame-retardants, were responsible for the depletion of the ozone layer [7]. It took fifteen years, but in 1987, an international treaty known as the Montreal Protocol was agreed upon, which resulted in the phasing out of all CFCs and other related substances in hopes of protecting the ozone layer [8]. Thirty years later we can now say that the Montreal Protocol was a success, and that there has been a decline in Antarctic ozone depletion as well as lower stratospheric chlorine, a byproduct of CFCs [9]. Humanity was able to recognize the influence our technology has on the ozone layer and come together to solve the problem, but we are currently going through another global environmental change due to our technology that requires us to come together as a global community once again. At the World Climate Conference in 1979 it was noted that increased CO2 in the atmosphere can contribute to a gradual warming of the planet [10]. There has been a lot of research on the topic climate change since then, and it can be said without a doubt that there has been a rapid increase in atmospheric CO2 due to human activities [3]. There has been no effective international treaty to reduce the human impact on climate change yet, but the Paris Climate Accord is set to come in to effect in 2020. The Paris Climate Accord is an international treaty that contains plans for greenhouse gas emissions mitigation, adaption, and financing so that humanity can keep the global temperature rise to well below than 2°C [11]. It is unknown at this time if these measures will succeed in mitigating climate change, but measures like these are a sign that humanity is capable of recognizing our own impact on the global environment and can consciously change our global behavior.
While the Paris Climate Accord merely aims to mitigate greenhouse gas emissions, new technology has the potential for us to not only mitigate emissions but to actually reduce the amount of CO2 present in the atmosphere. There are multiple methods of carbon fixation currently under research, and they primarily fall under two related categories: synthetic metabolic pathways and artificial photosynthesis. Schwander et al. designed a continuous synthetic carbon fixation pathway involving “.17 enzymes from nine different organisms and all three domains of life.” [12] that is five times more efficient than the most common natural carbon fixation pathway. A few of the enzymes used were rationally engineered to catalyze the desired reactions, and the pathway clearly demonstrates the ingenuity of humanity to design a “.synthetic alternative that [does] not require the serendipity of evolution to bring together all components in space and time” [12]. Synthetic pathways like Schwander et al.’s can be used in a variety of ways, such as in engineered photosynthetic organisms to improve CO2 fixation or in completely artificial photosynthetic processes like artificial leaves that rely on photovoltaics and other catalytic technologies [13]. Technological carbon fixation is not feasible at a global scale yet, but Rheticus is a new joint research project backed by major electrical and chemical companies to demonstrate the feasibility of artificial photosynthesis by using electricity from renewable sources and genetically engineered bacteria to convert CO2 into specialty chemicals at a pilot-plant scale [14]. Ventures such as this may eventually bring industrial carbon fixation into the same realm as industrial nitrogen fixation by the Haber-Bosch process, which is responsible for half of the Earth’s nitrogen fixation [15].
The Earth will continue revolve around the sun no matter what humanity does to it, but it is up to humanity to understand Earth’s biogeochemical cycles and the influence our rapidly-advancing technology has on them if we want to take the captain’s chair on “Spaceship Earth” [16]. Biogeochemical cycles have been a popular field of research for many years, but it’s only been since the dawn of computers that we have been able to create complex computation models. With modern computational tools, we can integrate multiple different models into a single, unified model and use machine learning technology to analyze and make predictions from data sets larger than any human could comprehend. We can also use models to make predictions about the way our own technology and its by-products will affect the global environment. Designing and implementing international treaties such as the Montreal Protocol and the Paris Climate Accord is how humanity demonstrates that it is the true captain of “Spaceship Earth”. Microbes cannot come together as a global community to consciously guide Earth’s biosphere, but humanity has demonstrated it is more than capable of doing so. And as we develop new artificial biogeochemical cycle technologies like synthetic metabolic pathways, artificial photosynthesis, and other unknown advancements, we will continually enhance our capability to guide Earth’s biosphere. In the far future, humanity may have such a firm grasp on synthetic biogeochemical cycles that we not only control Earth’s biosphere but are also capable of terraforming other planets such as Mars or Venus, creating a robust, interplanetary biosphere.
[1] Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature, 409(6823), 1083-1091. https://doi.org/10.1038/35059210
[2] Waters, C. N., Zalasiewicz, J., Summerhayes, C., Barnosky, A. D., Poirier, C., Gauszka, A., . . . Wolfe, A. P. (2016). The anthropocene is functionally and stratigraphically distinct from the holocene. Science (New York, N.Y.), 351(6269), aad2622-aad2622. https://doi.org/10.1126/science.aad2622
[3] Falkowski, P., Scholes, R. J., Boyle, E., Canadell, J., Canfield, D., Elser, J., . . . Steffen, W. (2000). The global carbon cycle: A test of our knowledge of earth as a system. Science, 290(5490), 291-296. https://doi.org/10.1126/science.290.5490.291
[4] Wang, K., Peng, C., Zhu, Q., Zhou, X., Wang, M., Zhang, K., & Wang, G. (2017). Modeling global soil carbon and soil microbial carbon by integrating microbial processes into the ecosystem process model TRIPLEX GHG. Journal of Advances in Modeling Earth Systems, 9(6), 2368-2384. https://doi.org/10.1002/2017MS000920
[5] Bauer, P., Thorpe, A., & Brunet, G. (2015). The quiet revolution of numerical weather prediction. Nature, 525(7567), 47. https://doi.org/10.1038/nature14956
[6] Jones N. 2017. How machine learning could help to improve climate forecasts. Nature 548:379-380. https://doi.org/10.1038/548379a
[7] Nobelprize.org. (1995, Oct. 11), The Nobel Prize in Chemistry 1995. Retrieved from Nobelprize.org
[8] Ozone Secretariat. (2017). The Montreal Protocol on Substances that Deplete the Ozone Layer. Retrieved from: http://ozone.unep.org
[9] Strahan SE, Douglass AR, Newman PA, Steenrod SD. 2014. Inorganic chlorine variability in the Antarctic vortex and implications for ozone recovery. Journal of Geophysical Research: Atmospheres 119. https://doi.org/10.1002/2014JD022295
[10] World Meteorological Organization. (1979). Declaration of the World Climate Conference. World Meteorological Organization. https://library.wmo.int/pmb_ged/wmo_537_en.pdf
[11] UN Treaty Collection. (2015, Dec. 12). Paris Agreement. Retrieved from: https://treaties.un.org
[12] Schwander, T., Schada von Borzyskowski, L., Burgener, S., Cortina, N. S., & Erb, T. J. (2016). A synthetic pathway for the fixation of carbon dioxide in vitro. Science, 354(6314), 900-904. https://doi.org/10.1126/science.aah5237
[13] Berardi S, Drouet S, Francàs L, Gimbert-Suriñach C, Guttentag M, Richmond C, Stoll T, Llobet A. 2014. Molecular artificial photosynthesis. Chem Soc Rev 43:7501-7519. https://doi.org/10.1039/c3cs60405e
[14] Siemens and Evonik. (2018, Jan. 18). Evonik and Siemens to generate high-value specialty chemicals from carbon dioxide and eco-electricity. Retrieved from https://www.siemens.com
[15] Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The evolution and future of earth’s nitrogen cycle. Science, 330(6001), 192-196. https://doi.org/10.1126/science.1186120
[16] Archenbach, J. (2012, Jan. 2). Spaceship Earth: A new view of environmentalism. The Washington Post. Link
Archenbach, J. (2012, Jan. 2). Spaceship Earth: A new view of environmentalism. The Washington Post. Link
Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The evolution and future of earth’s nitrogen cycle. Science, 330(6001), 192-196. https://doi.org/10.1126/science.1186120
Falkowski, P., Scholes, R. J., Boyle, E., Canadell, J., Canfield, D., Elser, J., . . . Steffen, W. (2000). The global carbon cycle: A test of our knowledge of earth as a system. Science, 290(5490), 291-296. https://doi.org/10.1126/science.290.5490.291
Kasting JF. 2002. Life and the Evolution of Earths Atmosphere. Science 296:1066-1068. https://doi.org/10.1126/science.1071184
Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature, 409(6823), 1083-1091. https://doi.org/10.1038/35059210
Rockström J, Steffen W, Noone K, Persson Å, Chapin FS, Lambin EF, Lenton TM, Scheffer M, Folke C, Schellnhuber HJ, Nykvist B, Wit CAD, Hughes T, Leeuw SVD, Rodhe H, Sörlin S, Snyder PK, Costanza R, Svedin U, Falkenmark M, Karlberg L, Corell RW, Fabry VJ, Hansen J, Walker B, Liverman D, Richardson K, Crutzen P, Foley JA. 2009. A safe operating space for humanity. Nature 461:472-475. https://doi.org/10.1038/461472a
Waters, C. N., Zalasiewicz, J., Summerhayes, C., Barnosky, A. D., Poirier, C., Gauszka, A., . . . Wolfe, A. P. (2016). The anthropocene is functionally and stratigraphically distinct from the holocene. Science (New York, N.Y.), 351(6269), aad2622-aad2622. https://doi.org/10.1126/science.aad2622
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578-6583. PMC33863
Discuss the relationship between microbial community structure and metabolic diversity
Evaluate common methods for studying the diversity of microbial communities
Recognize basic design elements in metagenomic workflows
The main goal was to more fully describe proteorhodopsin (PR) photosystem genetics and biochemistry. PRs are retinal-containing proteins that catalyze light-activated proton efflux across the cell membrane and are found globally in the ocean’s photic zone and in a diverse array of Bacteria and Archaea. What are the minimal heterologous genetic level transfers required for the transfer of the phenotype?
High-density colony macroarrays - Screening for PR expression Fosmid library screen - Screened for PR-containing clones on retinal-containing LB agar plating medium - Used a copy-control system that allowed a controlled transition from one copy per cell to multiple (up to 100) vector copies upon addition of the inducer L-arabinose HPLC Analysis - For separation of carotenoids Proton-Pumping Experiments - Change in pH of a water bath as a result of a clone’s proton pump was the detection method ATP Production Assays - ATP measured with a luciferase-based assay
Took DNA from the environment, created a library out of it, and performed a screen in E. coli for a phenotype with genes it does not normally have. - A powerful way of mining the uncultivated diversity found around the world and using it to engineer microbes to discover the minimal number of genes required to generate a specific phenotype
Discovered that it only required 7 genes to create the PR photosystem phenotype, which is a small enough number of genes that they could easily fit on a single F1 plasmid, enabling extensive horizontal gene transfer - Importantly, it is apparent that the PR photosystem is ubiquitous among diverse microbial taxa
Do new questions arise from the results?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
via 16S rRNA databases
2016
+ 89 bacterial phyla + 20 archael phyla + BUT, could be up to 1500 bac. phyla ->as there could be microbes that live in the “shadow biosphere”
2003
+ 26 of 52 major bacterial phyla have been cultured - probably more now!
How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
How many: Many thousands - always changing eg) 110217 on EBI database
Types of environments - ALL (Sediment, soil, gut, aquatic…..) esp. those wehere it’s hard to culture cumminities in lab settings
What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
There are many on-line resources available within the following categories:
Shotgun metagenomics
* Assembly ex) Velvet
* Binning ex) TACOA
* Annotation ex) Bowtie
* Analysis pipelines ex) eggNOG
Marker gene metagenomics Standalone software ex) Mothur Analysis pipelines ex) SILVA - gold standard Denoising ex) DADA Databases ex) Ribosomal Database Project (RDP) - gold standard
Sourced from: Oulas, A., Pavloudi, C., Polymenakou, P., Pavlopoulos, G. A., Papanikolaou, N., Kotoulas, G., . . . Iliopoulos, I. (2015). Metagenomics: Tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinformatics and Biology Insights, 2015(9), 75-88. 10.4137/BBI.S12462
ALSO:
IMG/M - large repo
MG-RAST - another large repo
NCBI - database connected to many other databases
Phylogenetic
Functional
not as useful as phylogeny
What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
Metagenomic sequence binning is the process of grouping sequences that come from a single genome
Types of algorithms:
1. assign sequences to bdatabase
2. Group to each other based on DNA characteristics: GC content, codon usage
Risks & Opportunities in binning. Risks: + incomplete coverabge of genome + contamination from different phylogeny
Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?
FISH probes
CANCELLED
Madsen EL. 2005. Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology 3:439-446. https://doi.org/10.1038/nrmicro1151
Martinez A, Bradley AS, Waldbauer JR, Summons RE, Delong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences 104:5590-5595.https://dx.doi.org/10.1073/pnas.0611470104
Wooley JC, Godzik A, Friedberg I. 2010. A Primer on Metagenomics. PLoS Computational Biology 6. https://doi.org/10.1371/journal.pcbi.1000667
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities
To understand the genetic bases for pathogenicity and the evolutionary diversity of E. coli by analyzing the genome sequence of E. coli CFT073, a pathogenic strain isolated from the blood of a woman with acute pyelonephritis and comparing it with the genome sequences of enterohemorrhagic E. coli strain EDL933 and the nonpathogenic laboratory strain MG1655
They cloned and sequenced an isolated strain of E. coli by using dye-terminator chemistry (Sanger Sequencing). Finishing used sequencing of opposite ends of linking clones, PCR-techniques, and primer walking.
Sequence analysis and annotation was done with MAGPIE, GLIMMER (to define ORFs), and BLAST.
They generated the complete genome sequence of the uropathogenic E. coli strain CFT073 and compared it to the EDL933 and MG1655 strains, and they found that only 39.2% of their combined set of proteins are common to all three strains.
The disease potential of CFT073 is reflected in the absence of genes for type 3 secretion system or phage- and plasmid-encoded toxins that is found in some diarrheagenic E. coli. CFT073 is rich in genes for fimbrial adhesins, autotransporters, iron-sequestration systems, and phage-switch recombinases.
The common core of the backbone has been preserved for generations. Genes within the islands of the backbones are more likely to be horizontally transferred. Overall, survival is preserved vertically but pathogenicity is transferred horizontally.
“black holes” - genes that are detrimental to a uropathogenic lifestyle that are lost are a challenge to assess due to a lack of sequences to compare to.
What is a species in a microbial world? Should the large differences in overall genome content be part of the definition, or should the definition focus on the backbones which are common?
It would have been nice if they had presented more data about the other strains they had used (Table 1)
One of the limitations of their analysis is that they only used three strains of e. coli when there are many more strains. With more strains, the overall variance may be less (sampling bias)
Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
Identify common molecular signatures used to infer genomic identity and cohesion
Differentiate between mobile elements and different modes of gene transfer
Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
The top half refers to CFT073 and the bottom half refers to EDL933. Each line relates to a gene island, with the size of the line indicating the size of the island and the position along the long representing the position in their genomes. The labels refer to islands that are located at tRNAs. (Asterisks) indicate islands at the same backbone position between both strains
The different strains reside in different environments, which is why there is ecotype diversity.
The common core of the backbone has been preserved for generations. Genes within the islands of the backbone are more likely to be horizontally transferred. Overall, survival is preserved vertically but pathogenicity is transferred horizontally.
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
#Libraries
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.4.4
library(tidyverse)
library(knitr)
library(vegan)
## Warning: package 'vegan' was built under R version 3.4.4
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
library(phyloseq)
#Data and Table 1 Shared by Ian Lee
Table_1 = data.frame(
Number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
Name = c("Rigoa","Skittles","MandMs","MikeandIkes","Gummybears","Lego","Gumdrops","fruitgummies","macrophage","cokebottles","Gummywhitedrops","Watermelon","RedGreenFish","Kisses","Redsnakes"),
Characteristics = c("long gummies","sour candy with shell","Chocolates with shell","Long chewy beans","Bear shaped gummies","Brick shaped hard candy","large round chewy candy with hard shell","fruit shaped gummies","octopus shaped gummies coated sugar","coke bottle shaped gummies","striped disk gummies coated suger","watermelon coloured and sphere shaped gummies","red and green fish shaped gummies","teardrop shaped chocolates","long thin red snake gummies"),
Occurences = c(7,197,218,199,91,18,24,2,6,3,3,1,1,16,13)
)
Table_1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| Number | Name | Characteristics | Occurences |
|---|---|---|---|
| 1 | Rigoa | long gummies | 7 |
| 2 | Skittles | sour candy with shell | 197 |
| 3 | MandMs | Chocolates with shell | 218 |
| 4 | MikeandIkes | Long chewy beans | 199 |
| 5 | Gummybears | Bear shaped gummies | 91 |
| 6 | Lego | Brick shaped hard candy | 18 |
| 7 | Gumdrops | large round chewy candy with hard shell | 24 |
| 8 | fruitgummies | fruit shaped gummies | 2 |
| 9 | macrophage | octopus shaped gummies coated sugar | 6 |
| 10 | cokebottles | coke bottle shaped gummies | 3 |
| 11 | Gummywhitedrops | striped disk gummies coated suger | 3 |
| 12 | Watermelon | watermelon coloured and sphere shaped gummies | 1 |
| 13 | RedGreenFish | red and green fish shaped gummies | 1 |
| 14 | Kisses | teardrop shaped chocolates | 16 |
| 15 | Redsnakes | long thin red snake gummies | 13 |
#The "organisms" found in this table takes into account all candy given to us and leaves none out. Rare or unclassifiable species were given their own bin with descriptions on how they differed.
#Create Data Table for Rarefy
Table_2 = data.frame(
Number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
Occurences = c(7,197,218,199,91,18,24,2,6,3,3,1,1,16,13)
)
#Rarefaction Curve from Vegan package
#Rarecurve(data, step size, labels )
rarecurve(Table_2, step = 1, xlab = "Cumulative Number of Species", ylab = "Number of Observed Species", label = TRUE)
Using the table from Part 1, calculate species diversity using the following indices or metrics.
# Modify Table 1 to be a community matrix, i.e. samples as rows (you will only have 1), "species" as columns, and the encompassed data as counts of those "species" in the sample
community = Table_1 %>%
# Select only name and count columns
select(Name, Occurences) %>%
# Spread into taxa as columns format
spread(Name, Occurences) %>%
# Convert to phyloseq OTU table data type
otu_table(taxa_are_rows = FALSE)
# Create a random subsample (rarefy) the data to 100 observations
subsample = as.data.frame(rarefy_even_depth(community, sample.size=100, replace=FALSE, rngseed=762))
## `set.seed(762)` was used to initialize repeatable random subsampling.
## Please record this for your records so others can reproduce.
## Try `set.seed(762); .Random.seed` for the full vector
## ...
## 3OTUs were removed because they are no longer
## present in any sample after random subsampling
## ...
#Make a table for the Sampling
subsample %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10)
| fruitgummies | Gumdrops | Gummybears | Gummywhitedrops | Kisses | Lego | macrophage | MandMs | MikeandIkes | Redsnakes | Skittles | Watermelon |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 7 | 13 | 1 | 4 | 3 | 1 | 26 | 24 | 1 | 18 | 1 |
#Sample Species Diversity
s1 = 1/100
s2 = 7/100
s3 = 13/100
s4 = 1/100
s5 = 4/100
s6 = 3/100
s7 = 1/100
s8 = 26/100
s9 = 24/100
s10 = 1/100
s11 = 18/100
s12 = 1/100
1/(s1^2 + s2^2 + s3^2 + s4^2 + s5^2 + s6^2 + s7^2 + s8^2 + s9^2 + s10^2 + s11^2 + s12^2)
## [1] 5.482456
#The Simpson Reciprocal Index for my random sample is 5.482456
#Overall Species Diversity
Species1 = 7/799
Species2 = 197/799
Species3 = 218/799
Species4 = 199/799
Species5 = 91/799
Species6 = 18/799
Species7 = 24/799
Species8 = 2/799
Species9 = 6/799
Species10 = 3/799
Species11 = 3/799
Species12 = 1/799
Species13 = 1/799
Species14 = 16/799
Species15 = 13/799
1/(Species1^2 + Species2^2 + Species3^2 + Species4^2 + Species5^2 + Species6^2 + Species7^2 + Species8^2 + Species9^2 + Species10^2 + Species11^2 + Species12^2 + Species13^2 + Species14^2 + Species15^2)
## [1] 4.706271
#The Simpson Reciprocal Index for the original total community is 4.706271
5.482456
4.706271
#Species = 12
#a = 5
#b = 7
12 +5^2/(2*7)
## [1] 13.78571
#Chao1 estimate for my sample is 13.78571
#Species = 15
#a = 2
#b = 13
12 +2^2/(2*13)
## [1] 12.15385
#Chao1 estimate for the overall community is 12.15385
#New Transposed Table for Overall Data
Total_Diversity=
Table_1 %>%
select(Name, Occurences) %>%
spread(Name, Occurences)
#Print
Total_Diversity
## cokebottles fruitgummies Gumdrops Gummybears Gummywhitedrops Kisses Lego
## 1 3 2 24 91 3 16 18
## macrophage MandMs MikeandIkes RedGreenFish Redsnakes Rigoa Skittles
## 1 6 218 199 1 13 7 197
## Watermelon
## 1 1
#Simpson Reciprocal Index Calculations
#Overall Data
diversity(Total_Diversity, index="invsimpson")
## [1] 4.706271
#Sample Data
diversity(subsample, index="invsimpson")
## [1] 5.482456
#Calculate chao1 estimates
#Overall Data
specpool(Total_Diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 15 15 0 15 0 15 15 0 1
#Sample Data
specpool(subsample)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 12 12 0 12 0 12 12 0 1
Simpson Diversity Matches
Chao1 Values are off for both the sample and the total community.
How does the measure of diversity depend on the definition of species in your samples? The species definition used greatly impacts the diversity of the samples. This is especially observed in the chao1 estimation because it uses any species with only a single member as part of the calculation. So samples that are segregated into as many different species as possible will likely end up with more singlular-species than if a more general species definition was used.
Can you think of alternative ways to cluster or bin your data that might change the observed number of species? The species were clustered using the type of candy they were, but alternative bins could have included colours/flavors. This would likely increase the number of species, unless the clustering was solely based on colours in which case they would have had similar numbers of species.
How might different sequencing technologies influence observed diversity in a sample? One influencing factor is the depth of the sequencing technology.The different sequencing technologies all have varying sequencing depths, which can influence the observed diversity in the sample. Also, we do not yet have a universally accepted 16S rDNA sequencing site, so the different primers and sequencing locations used will impact the observed diversity.
“Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?”
In his evolutionary book On the Origin of Species, Charles Darwin eloquently stated:
“As many more individuals of each species are born than can possibly survive; and as, consequently, there is a frequently recurring struggle for existence, it follows that any being, if it vary however slightly in any manner profitable to itself, under the complex and sometimes varying conditions of life, will have a better chance of surviving, and thus be naturally selected” [1].
This statement has defined the paradigm that is evolutionary biology for 150 years. However, the complexity of microbial evolution was not yet known at the time of Darwin’s book, for horizontal gene transfer (HGT) would not be discovered for another 60 years [2]. We now have a better understanding of the complexity of microbial evolution as well as the immense diversity of microbial life, but defining exactly what constitutes a microbial species has many complicating factors, such as HGT, that we have yet to definitively clarify and incorporate into a complete microbial species definition.
There have been many different approaches to classifying microbial species. One of the most common methods of classifying new organisms is a polyphasic approach that takes into consideration multiple characteristics of the microbe like 16S rRNA sequencing, DNA-DNA hybridization, and phenotypic characterizations [3]. While each individual approach has its own weaknesses, collectively they have been used to classify many microbial species. However, with advances in genome sequencing technology it has become apparent that even within a species there can be incredible genomic diversity [4, 5]. This has led to the development of new, genomic classification techniques such as multiple locus sequence analyses (MLSA), which uses the sequences of housekeeping genes to determine phylogenetic relationships [6]. One of the issues with the MLSA approach is determining the threshold or the transition point between species’ boundaries that is not subjective, but approaches like genealogical concordance attempt to resolve this issue by objectively delineating groups that are representative of species (Figure 1) [7]. While genealogical concordance may prove to be useful for objectively determining species’ boundaries, it only utilizes housekeeping genes and thus avoids the complexity of HGT and has limited ability to distinguish between ecotypes. In other words, it classifies organisms by their vertically evolved gene backbones without taking into account the variable, HGT-derived genomic islands [5].
Figure 1. Genomic species hypotheses. A standard MLSA tree creates multiple potential species hypotheses due to the subjective nature of determining the transition point between species’ boundaries (a), while genealogical concordance analyses result in objective species’ boundaries by determining the transition point between genealogical discord within species and genealogical concordance between species (b). Figure from [7].
Microbial evolution has been greatly influenced by dynamic community structures and HGT, making it possible for entire metabolic pathways to spread throughout Earth’s ecology. For example, proteorhodopsin photosystems can be horizontally-transferred in a single event allowing the recipients to use photophosphorylation to supplement their energy needs [8]. Being able to exchange the incredibly versatile, fitness-improving photosystem with a single HGT event likely explains the ecological and phylogenetic prevalence of these photosystems in nature [8]. However, at the community level it is common to find metabolic networks that are distributed amongst community members rather than equally shared. The reason for this is that there is a fitness advantage for smaller genome sizes, so organisms will reduce their replication costs by losing genes for metabolic pathways that are already supported by another organism [9, 10]. This phenomenon has been referred to as the black queen hypothesis in reference to the card came “Hearts,” where the players need to strategically avoid taking the queen of spades by tricking their opponents to take it for themselves [9]. This is analogous to individual microbes passing off the production of essential metabolites to others, which has been seen in marine vitamin synthesis and trafficking [10]. Restricting a community’s metabolite production to a limited number of organisms may put the community at risk over geological time scales, but because of the flexibility HGT provides it is possible to maintain those communities when the environmental conditions fluctuate.
The Earth’s biosphere has undergone many microbially-driven changes over time, and not only has HGT been responsible for influencing the Earth’s biogeochemical cycles, but HGT has also helped maintain them over the eons. In our oceans today, microbial communities are dominated by a small number of different populations, but there are also thousands of low-abundance populations that are responsible for most of the communities’ observed phylogenetic diversity [11]. This “rare biosphere” has persisted over geological time scales and is likely responsible for episodically reshaping Earth’s biogeochemical processes [11]. These low-abundance populations can also be viewed as “guardians of metabolism,” because there is enough diversity in their populations to survive environmental perturbation and maintain fundamental biogeochemical metabolisms [12]. Also, it is possible for those metabolic systems to undergo extensive HGT and be spread throughout the biosphere and even between domains of life [12]. For example, this has been found to have occurred with the transfer of sulfate respiration between bacteria [13], methane-oxidation from Archaean methanogens to bacteria [14]; and nitrogenases from an Archaean source to cyanobacteria [15]. While all of these examples demonstrate how HGT has influenced and maintained Earth’s biogeochemical cycles over time, they also demonstrate how HGT has influenced the evolution and phylogenetic distribution of microbial metabolic pathways.
Defining a microbial species is complicated by HGT. Our current definitions rely on multiple approaches of classification, such as 16S rRNA sequencing, DNA-DNA hybridization, phenotypic classifications, and multiple variations of MLSA analyses [3, 6, 7]. Each method has its drawbacks, which is why polyphasic approaches have been gaining popularity [3], but next generation sequencing (NGS) technologies have made whole genome analyses a practical option [16]. While still in their infancy, NGS technologies have drastically lowered the cost of whole genome sequencing for researchers, but there has yet to be a universally-accepted consensus on how to appropriately analyze the data [16]. A popular method is the average nucleotide identity (ANI) analysis, which uses in silico comparisons of whole genomes to determine the phylogenetic relationship between species [16]. However, this method still has its own share of problems, such as still requiring a subjective threshold for species’ boundaries [16], intra-species variations in genomic content due to HGT [17], and approximately 18% prokaryotic species defined by a polyphasic approach suffer from anomalies when analyzed by ANI [18]. Of course, this might not be an issue of ANI being incorrect but rather the polyphasic approach being incorrect, but it serves to demonstrate the complexity of defining microbial species when there is no universally accepted method.
One thing that must be asked at this point, is whether a clear microbial species definition is even necessary. This is a difficult question to answer, because determining what constitutes as “necessary” is sure to be subjective. I feel that a clear microbial species definition is definitely something that researchers should strive towards, because a universally accepted method would avoid classification anomalies when carrying out meta-analyses as done by Konstantinidis et al. [18]. However, the importance of a clear microbial species definition is threatened by ANI analyses that have found certain genera that do not exhibit clustering, while unrelated, obligatory pathogens that inhabit extremely similar, restricted niches do exhibit clusters [17]. This once again brings up the concept of ecotypes, which are a form of classification focusing on the ecological niches that organisms inhabit, and which can have incredible variation within species [5, 17, 19]. Another important component of niches that must be addressed are microbial communities, because it has been clearly established that microbial communities possess dynamic, distributed metabolic networks [9, 10, 20], so it does not make sense to solely classify a microbe as an individual when it truly exists as a member of a larger community. Perhaps in the future we will have a much more robust polyphasic approach to microbial classification that not only relies upon the individual microbe’s genomic and phenotypic characteristics, but also a separate classification of ecotype that can be shared between different microbial species and also encompasses microbial communities.
[1] Darwin C. 1859. On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray. p 5. http://graphics8.nytimes.com/packages/images/nytint/docs/charles-darwin-on-the-origin-of-species/original.pdf
[2] Griffith F. 1928. The Significance of Pneumococcal Types. Journal of Hygiene 27:113-159. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2167760/pdf/jhyg00267-0003.pdf
[3] Prakash O, Verma M, Sharma P, Kumar M, Kumari K, Singh A, Kumari H, Jit S, Gupta SK, Khanna M, Lal R. 2007. Polyphasic approach of bacterial classification - An overview of recent advances. Indian Journal of Microbiology 47:98-108.
https://dx.doi.org/10.1007%2Fs12088-007-0022-x
[4] Thompson JR. 2005. Genotypic Diversity Within a Natural Coastal Bacterioplankton Population. Science 307:1311-1313. http://science.sciencemag.org/content/307/5713/1311
[5] Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S- R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences 99:17020-17024. www.pnas.org/cgi/doi/10.1073/pnas.252529799
[6] Glaeser SP, Kämpfer P. 2015. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy. Systematic and Applied Microbiology 38:237-245. https://doi.org/10.1016/j.syapm.2015.03.007
[7] Venter SN, Palmer M, Beukes CW, Chan W-Y, Shin G, Zyl EV, Seale T, Coutinho TA, Steenkamp ET. 2017. Practically delineating bacterial species with genealogical concordance. Antonie van Leeuwenhoek 110:1311-1325. https://rdcu.be/L71c
[8] Martinez A, Bradley AS, Waldbauer JR, Summons RE, Delong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences 104:5590-5595. www.pnas.org/cgi/doi/10.1073/pnas.0611470104
[9] Morris JJ, Lenski RE, Zinser ER. 2012. The Black Queen Hypothesis: Evolution of Dependencies through Adaptive Gene Loss. mBio 3. http://doi.org/10.1128/mBio.00036-12.
[10] Giovannoni SJ. 2012. Vitamins in the sea. Proceedings of the National Academy of Sciences 109:13888-13889. www.pnas.org/cgi/doi/10.1073/pnas.1211722109
[11] Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ. 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences 103:12115-12120. www.pnas.org/cgi/doi/10.1073/pnas.0605127103
[12] Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earths Biogeochemical Cycles. Science 320:1034-1039. https://DOI.org/10.1126/science.1153213
[13] Friedrich MW. 2002. Phylogenetic Analysis Reveals Multiple Lateral Transfers of Adenosine-5-Phosphosulfate Reductase Genes among Sulfate-Reducing Microorganisms. Journal of Bacteriology 184:278-289. https://www.ncbi.nlm.nih.gov/pubmed/11741869
[14] Chistoserdova L, Vorholt JA, Thauer RK, Lidstom ME. 1998. C1 Transfer Enzymes and Coenzymes Linking Methylotrophic Bacteria and Methanogenic Archaea. Science 281:99-102. https://www.ncbi.nlm.nih.gov/pubmed/9651254
[15] Kechris KJ, Lin JC, Bickel PJ, Glazer AN. 2006. Quantitative exploration of the occurrence of lateral gene transfer by using nitrogen fixation genes as a case study. Proceedings of the National Academy of Sciences 103:9584-9589. https://doi.org/10.1073/pnas.0603534103
[16] Rosselló-Móra R, Amann R. 2015. Past and future species definitions for Bacteria and Archaea. Systematic and Applied Microbiology 38:209-216. https://doi.org/10.1016/j.syapm.2015.02.001
[17] Konstantinidis KT, Ramette A, Tiedje JM. 2006. The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B: Biological Sciences 361:1929-1940. https://dx.doi.org/10.1098%2Frstb.2006.1920
[18] Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides NC, Pati A. 2015. Microbial species delineation using whole genome sequences. Nucleic Acids Research 43:6761-6771. https://doi.org/10.1093/nar/gkv657
[19] Koeppel A, Perry EB, Sikorski J, Krizanc D, Warner A, Ward DM, Rooney AP, Brambilla E, Connor N, Ratcliff RM, Nevo E, Cohan FM. 2008. Identifying the fundamental units of bacterial diversity: A paradigm shift to incorporate ecology into bacterial systematics. Proceedings of the National Academy of Sciences 105:2504-2509. https://doi.org/10.1073/pnas.0712205105
[20] Zaikova E, Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191. https://doi.org/10.1111/j.1462-2920.2009.02058.x
Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal 11:2639-2643. https://doi.org/10.1038/ismej.2017.119
Gaudet AD, Ramer LM, Nakonechny J, Cragg JJ, Ramer MS. 2010. Small-Group Learning in an Upper-Level University Biology Class Enhances Academic Performance and Student Attitudes Toward Group Work. PLoS ONE 5. https://doi.org/10.1371/journal.pone.0015821
Hallam SJ, Torres-Beltrán M, Hawley AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data 4:170158 https://doi.org/10.1038/sdata.2017.158
Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, Rio TGD, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data 4:170160. https://doi/10.1038/sdata.2017.160
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology 12:118-123. https://doi.org/10.1111/j.1462-2920.2009.02051.x
Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ. 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences 103:12115-12120. https://doi.org/10.1073/pnas.0605127103
Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S- R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences 99:17020-17024. https://doi.org/10.1073/pnas.252529799
Analysis of sequences obtained from Saanich Inlet using mothur and QIIME2 revealed peak community alpha-diversity at a depth of approximately 100 m which contains approximately 38 uM of dissolved oxygen. Lowest diversity was observed at greater depth and with lower oxygen levels. Further analysis of taxonomic levels revealed Proteobacteria as the most abundant phylum within all samples. In order to investigate how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, we focused on the phylum Chloroflexi. Analysis revealed a positive correlation between depth and Chloroflexi abundance. This correlation was significant only when analysis was based on QIIME2-generated ASVs, but not mothur-generated OTUs. In addition, there exists a negative correlation between oxygen concentration and Chloroflexi abundance. Similarly, significance was only reported using QIIME2-generated ASVs. The analysis of data using QIIME2 identified four classes within the phylum Chloroflexi (Dehalococcoidia, Anaerolineae, SAR202 and JG30-KF-CM66) while mothur identified two classes within this phylum (Anaerolineae and SAR202). Changes in the abundance of OTUs and ASVs were correlated with depth and oxygen concentration. No correlation was found to be significant. The abundances of 24 (of 34) OTUs and 38 (of 47) ASVs were positively correlated with depth, while all OTUs and 46 ASVs were negatively correlated with oxygen concentration. However, the presence of seeming outliers in data analyzed by both mothur and QIIME2 may have biased the trend line and reduced model significances. Although, the absence of sufficient data limits us from classifying them as outliers. Lastly, changes in the abundances of some OTUs and ASVs were found to be significantly positively correlated with hydrogen sulfide levels. The differences observed between mothur and QIIME2 indicate the role played by the choice of pipeline to analyze results.
Saanich Inlet is a seasonally anoxic fjord [1] located between Vancouver Island and the Saanich Peninsula. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen is replenished dependent on the season, mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabit the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at higher depth up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is vital to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the worldâs oceans [3].
Oxygen minimum zones (OMZ) are places within the ocean where the saturation of oxygen is the lowest, typically occurring at depths of about 200 to 1000 meters [4]. These OMZs are normally found along the western boundaries of continents, which is where upwelling can bring nutrient rich water from deep within the ocean up to the surface [5]. They can also be found in coastal basins where restricted circulation can restrict mixing of deep and surface waters [5]. Not only that, but human activity can affect the size of periodic dead zones (in coastal ocean and enclosed basins) by causing runoff from agriculture to mix with marine waters - this process is called eutrophication [5]. Some important examples of this include dead zones in the Mississippi River Delta and Chesapeake Bay in America. The Saanich Inlet in British Columbia is a glacially carved fjord unlike these other zones where its entrance sill restricts oxygen rich water from entering the basin [5].
Operational Taxonomic Units (OTUs) are defined as clusters of organisms that have been grouped based on DNA sequence similarity of a specific DNA segment known as a taxonomic marker gene [6]. The grouped DNA sequences differ by less than a fixed and arbitrary sequence dissimilarity threshold, often 3% [7]. This process of clustering on a specific DNA segment, known as DNA barcoding, allows for rapid, targeted, and high throughput analysis of genetic variation in a specific genomic region such as 16s/18s rRNA sequences, leading to large scale characterization of microbial communities [6, 8]. However, new recent amplicon sequence variants (ASVs) methods have been developed with finer resolution and are independent of dissimilarity thresholds that have been used to define OTUs. ASV methods have shown higher specificity and sensitivity in comparison to OTU methods as they distinguish sequence variants as small as single nucleotides and denoise the sequences by discriminating biological sequences from errors. This is done based on the expectation that biological sequences are more abundant and more repeatedly observed than error-containing sequences [7].
Using OTU and ASV data for samples collected from the Saanich Inlet, we investigated how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, with a particular focus on the phylum Chloroflexi. We found Chloroflexi of interest because its members are highly abundant in marine sediments [9] and present a broad spectrum of metabolic characteristics such as anoxygenic photosynthesis [10], obligate aerobic and anaerobic heterotrophy [11], and even predation with a gliding motility [12]. Like many other microbes, members of Chloroflexi can be a challenge to grow in culture, with some classes yet to be cultured successfully, which has made characterizing their metabolisms a challenge [13, 21]. However, new sequencing technologies have made it possible to analyze the genome of and characterize these uncultured microbes [13-21].
In our Saanich Inlet data, we were able to identify four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66. Members of the class Dehalococcoidia are widely distributed throughout marine sediments [13] and anoxic deep waters [14]. Dehalococcoidia grow via anaerobic organohalide respiration and are extensively studied for their potential in the bioremediation of chloride-contaminated water and soil [13, 14]. As for the class Anaerolineae, despite its members being prevalent in various ecosystems, only a few strains have been successfully cultured [15]. Anaerolineae compose one of the core populations of anaerobic bacteria involved in anaerobic digestion and possess key genes for catalyzing cellulose hydrolysis [16].The SAR202 cluster was one of the earliest discoveries of marine bacteria which inhabited the aphotic zone [17], and since then SAR202 has been found to be ubiquitous throughout the deep ocean [18]. Members of SAR202 are involved in metabolizing organosulfur compounds and likely play a major role in sulfur cycling [19]. JG30-KF-CM66 is a relatively uncharacterized clade of acidobacteria, but it has been identified in soft coal slags [20] and anoxic ocean water [21]. The characteristics of each of these classes impact how the spatial distribution of the Chloroflexi phylum differs within Saanich Inlet. We also set out to determine if and how different sequence processing pipelines, specifically mothur and QIIME2, would impact these biological conclusions.
Water samples from 16 depths (10-200m) from cruise 72 were collected at station S3 (48°35.500 N, 123°30.300 W) onboard MSV John Strickland. Geochemical and multi-omic information, which included 16S rRNA gene amplicon sequences (V4-5 hypervariable regions) and dissolved O2, were extracted for each depth [22, 23]. Data from 7 depths (10, 100, 120, 135, 150, 165, 200m) were further analyzed. Dissolved O2 was measured onboard by the Sea-Bird SBE 43 Photosynthetically Active Radiation sensor [22]. 2L of water sample at each depth were filtered onto 0.22μm Strerivex filters, and stored until amplicon sequencing, which was carried out on the Illumina MiSeq platform at the Joint Genome Institute. Base qualities were encoded in Phred33, and primers 515F and 806R were used for 16S rRNA gene amplification [23].
Reads were independently processed using mothur and QIIME2 based pipelines, which cluster sequences based on OTUs and ASVs, respectively. Resultant data were constructed as phyloseq objects for downstream analysis in R. For a comprehensive outline and description of commands used for each pipeline, refer to the R markdown documents provided by Kim Dill-McFarland (2018; mothur pipeline) and Julia Beni (2018; QIIME2 pipeline).
Sequenced reads were first assembled into contigs, which were screened and de-duplicated so that the remainder 1) were between 200bp and 600bp long, 2) had fewer than 8 homopolymers and 3) had no ambiguous bases.
Configs were trimmed such that they would only align to bases 10368 to 25434 in the SILVA database (release 128), and uninformative bases were removed. Resultant sequences were de-duplicated again.
Sequences were then pre-clustered and clustered de novo (using 97% sequence similarity) to determine the final OTUs. Chimeric sequences were also filtered away.
Clusters were classified using the SILVA database, and resulting taxonomies were condensed.
Reads were demultiplexed and imported into QIIME2. Per-base read qualities were visualized to determine downstream trim parameters.
ASVs were generated using the Dada2 protocol using custom trim parameters for quality control. Resultant ASVs were classified using the SILVA database (release 119) using a 99% similarity threshold.
The ASV table was then converted to text format used to create a phyloseq object.
Analysis was completed in R v3.4.3 [24] using the following packages.
library(tidyverse)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(stringr)
library(magrittr)
library(knitr)
library(gridExtra)
library(grid)
library(randomcoloR)
Alpha-diversities of clusters identified by mothur and QIIME2 from each sample were measured by the Shannon diversity index and the Chao1 richness estimator. Alpha-diversities were plotted against sample depth and oxygen concentration for both clustering methods, and were fitted using local polynomial regression models where appropriate. Relative abundances of all phylum level classifications produced by mothur and QIIME2 were also plotted for each sample.
Relative abundances of Chloroflexi OTUs and ASVs amongst all clusters were plotted, both as a whole and individually, across depth and oxygen gradients. Significances of correlations between these variables were based on linear regression models, as all variables are continuous and there is a lack of evidence to suggest curvilinear relationships between them.
# Generate random colours for use in figures.
palette <- distinctColorPalette(40)
Data were loaded into R and samples normalized to 100,000 sequences per sample.
load("mothur_phyloseq.RData")
load("qiime2_phyloseq.RData")
# Random seed set for reproducibility
set.seed(4831)
# Data normalized to 100,000 sequences per sample
m.norm = rarefy_even_depth(mothur, sample.size=100000)
q.norm = rarefy_even_depth(qiime2, sample.size=100000)
Relative abundance percentages were calculated for the data.
m.percent = transform_sample_counts(m.norm, function(x) 100 * x/sum(x))
q.percent = transform_sample_counts(q.norm, function(x) 100 * x/sum(x))
The phylum Chloroflexi was chosen.
phylum_name_mothur = "Chloroflexi"
phylum_name_qiime2 = "D_1__Chloroflexi"
Shannon diversity index and Chao1 were calculated for the total microbial community across depth and oxygen concentration gradients for both mothur and QIIME2.
# Alpha-diversity of total community for mothur
# Calculate Chao1 and Shannon
m.alpha = estimate_richness(m.norm, measures = c("Chao1", "Shannon"))
# Combine Chao1 and Shannon data with the rest of the geochemical data into 1 data frame
m.meta.alpha = full_join(rownames_to_column(m.alpha),
rownames_to_column(data.frame(m.percent@sam_data)), by = "rowname")
# Save plots for the different combinations (Shannon vs Chao1 across depth vs oxygen)
m.shannon.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="mothur", y="Shannon diversity index", x=NULL)
m.chao1.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="mothur", y="Chao1 richness estimator", x="Depth (m)")
m.shannon.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Shannon), width = 5, shape = 1) +
labs(title="mothur", y="Shannon diversity index", x=NULL)
m.chao1.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Chao1), width = 5, shape = 1) +
labs(title="mothur", y="Chao1 richness estimator", x="Oxygen (uM)")
# Alpha-diversity of total community for QIIME2
# Calculate Chao1 and Shannon
q.alpha = estimate_richness(q.norm, measures = c("Chao1", "Shannon"))
# Combine Chao1 and Shannon data with the rest of the geochemical data into 1 data frame
q.meta.alpha = full_join(rownames_to_column(q.alpha),
rownames_to_column(data.frame(q.percent@sam_data)), by = "rowname")
# Save plots for the different combinations (Shannon vs Chao1 across depth vs oxygen)
q.shannon.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='loess', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="QIIME2", y=NULL, x=NULL)
q.chao1.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='loess', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="QIIME2", y=NULL, x="Depth (m)")
q.shannon.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Shannon), width = 5, shape = 1) +
labs(title="QIIME2", y=NULL, x=NULL)
q.chao1.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Chao1), width = 5, shape = 1) +
labs(title="QIIME2", y=NULL, x="Oxygen (uM)")
# Plotting alpha-diversities versus depth
grid.arrange(m.shannon.depth.plot, q.shannon.depth.plot, m.chao1.depth.plot, q.chao1.depth.plot, ncol=2)
Figure 1 Alpha-diversity of samples collected at Saanich Inlet across depth. Sequence processing was done with both mothur and QIIME2. Points were fitted using local polynomial regression fitting. 95% confidence band is displayed in grey.
The same patterns of alpha-diversity (Shannon diversity index and the Chao1 richness estimator) can be observed across depth for both mothur and QIIME2 (Figure 1). There is a slightly lower diversity in surface waters (0m) compared to 100m depth. Peak diversity is reached at ~100-120m then diversity decreases with greater depth, with a slight increase at 200m for all but Shannon diversity index for QIIME2.
Note, however, that despite the similarity in the alpha-diversity pattern, the comparison of mothur versus QIIME2 shows difference: across all depths, mothur OTU analysis resulted in a lower alpha-diversity than the QIIME2 ASV analysis when measured with the Shannon diversity index and a higher alpha-diversity than the QIIME2 ASV analysis when measured with Chao1.
# Plotting alpha-diversities versus oxygen concentration
grid.arrange(m.shannon.o2.plot, q.shannon.o2.plot, m.chao1.o2.plot, q.chao1.o2.plot, ncol=2)
Figure 2 Alpha-diversity of samples collected at Saanich Inlet across oxygen concentration. Sequence processing was done with both mothur and QIIME2. Points are jittered to show 2 sets of 2 closely overlapping points revealed by the mothur pipeline.
Looking at Shannon diversity across oxygen concentration (Figure 2), we find that at equivalent depths QIIME2 has a greater diversity than mothur. However, the pattern exhibited by both mothur and QIIME2 data is still similar. The three lowest diversity points are at an oxygen concentration of 0 uM, while the highest diversity is found at an oxygen concentration of ~38 uM. The band of 95% confidence intervals was not plotted due to the lack of data between ~38 uM and ~217 uM of oxygen.
Comparing Chao1 at different oxygen levels for mothur and QIIME2 shows that the patterns somewhat differ. While the three lowest diversity points are still at 0 uM of oxygen, for mothur the highest diversity in terms of Chao1 is at an oxygen concentration of ~38 uM, while for QIIME2 it is at an oxygen concentration of ~32 uM. For both, oxygen concentration of ~217 uM shows a notable decrease in diversity. Chao1 exhibited a relatively greater drop at ~217 uM of oxygen compared to Shannon.
# Save plot for phylum abundance composition for mothur
m.phyla.plot = m.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(y="Abundance (%)")+
scale_fill_manual(values=palette)+
theme(legend.text=element_text(size=11))
# Save plot for phylum abundance composition for QIIME2
q.phyla.plot = q.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(y="Abundance (%)")+
scale_fill_manual(values=palette)+
theme(legend.text=element_text(size=11))
Figure 3 Relative distribution of phyla across samples from Saanich Inlet when clustered as OTUs using mothur.
Figure 4 Relative distribution of phyla across samples from Saanich Inlet when clustered as ASVs using QIIME2.
Mothur and QIIME2 identified 28 and 29 taxa, respectively, at the phylum level (Figures 3 & 4). Out of these identified phyla in both mothur and QIIME2, ~4 dominated the community composition in terms of abundance: Proteobacteria, Bacteroidetes, Thaumarchaeota and Actinobacteria (from most to less abundant). Other phyla that are noticeably more abundant include Cyanobacteria, Deferribacteres, Euryarchaeota, Firmiucutes, Gemmatimonadetes, Marinimicrobia, Nitrospinae, Planctomycetes and Verrucomicrobia. Our phylum of interest, Chloroflexi, makes up from 0 to 6% of the microbial community in the collected samples depending on depth.
# Get summary of linear model statistics for Chloroflexi abundance across depth for mothur
m.chlor.lm = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
# Get summary of linear model statistics for Chloroflexi abundance across depth for QIIME2
q.chlor.lm = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
# Make a data frame for linear model statistics data for depth
taxon.abundance = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance <- rbind(taxon.abundance, m.chlor.lm$coefficients["Depth_m",])
taxon.abundance <- rbind(taxon.abundance, q.chlor.lm$coefficients["Depth_m",])
rownames(taxon.abundance) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(taxon.abundance,caption="Table 1 Correlation Data of Chloroflexi Phylum across Depth")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | 1.327529 | 0.6389862 | 2.077554 | 0.0923485 |
| QIIME2 | 2.622128 | 0.4212043 | 6.225311 | 0.0015644 |
# Filter our phylum, group by sample and summarize abundance across depth for mothur
m.abd.depth.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
# Use the data in a plot (abundance versus depth)
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="mothur", y="Abundance (%)", x="Depth (m)")
# Filter our phylum, group by sample and summarize abundance across depth for QIIME2
q.abd.depth.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
# Use the data in a plot (abundance versus depth)
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="Depth (m)")
# Get summary of linear model statistics for Chloroflexi abundance across oxygen concentrations for mothur
m.chlor.lm.ox = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
# Get summary of linear model statistics for Chloroflexi abundance across oxygen concentrations for QIIME2
q.chlor.lm.ox = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
# Make a data frame for linear model statistics data for oxygen concentrations
taxon.abundance.ox = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance.ox <- rbind(taxon.abundance.ox, m.chlor.lm.ox$coefficients["O2_uM",])
taxon.abundance.ox <- rbind(taxon.abundance.ox, q.chlor.lm.ox$coefficients["O2_uM",])
rownames(taxon.abundance.ox) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance.ox) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(taxon.abundance.ox,caption="Table 2 Correlation Data of Chloroflexi Phylum across Oxygen Concentration")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | -0.750471 | 0.5865861 | -1.279387 | 0.2569128 |
| QIIME2 | -1.731996 | 0.5762708 | -3.005525 | 0.0299088 |
# Filter our phylum, group by sample and summarize abundance across oxygen concentration for mothur
m.abd.o2.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
# Use the data in a plot (abundance versus oxygen)
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="mothur", y="Abundance (%)", x="O2 (uM)")
# Filter our phylum, group by sample and summarize abundance across oxygen concentration for QIIME2
q.abd.o2.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
# Use the data in a plot (abundance versus oxygen)
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="O2 (uM)")
# Plotting Chloroflexi abundance across depth
grid.arrange(m.abd.depth.plot, q.abd.depth.plot, ncol=2)
Figure 5 Relative abundance of Chloroflexi across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey.
# Plotting Chloroflexi abundance across oxygen concentration
grid.arrange(m.abd.o2.plot, q.abd.o2.plot, ncol=2)
Figure 6 Relative abundance of Chloroflexi across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey.
Linear regression analysis of Chloroflexi relative abundance across depth revealed variations between mothurâs OTU and QIIME2âs ASV clustering (Figure 5). Abundance of ASV clusters revealed a significant correlation with depth (p<0.05), while OTU clusters did not (Table 1). Both correlations were found to be positive.
Similarly, linear regression analysis of Chloroflexi relative abundance across oxygen concentration (Figure 6) revealed a significant correlation of oxygen concentration with ASV clusters (p<0.05), but not with OTU clusters (Table 2). Both correlations were found to be negative.
# Get OTUs taxa
m.tax_table = data.frame(m.norm@tax_table)
# Filter out OTUs with our phylum name
m.filtered = m.tax_table %>%
rownames_to_column('OTU') %>%
filter(Phylum==phylum_name_mothur) %>%
column_to_rownames('OTU')
# Get total number of OTUs in Chloroflexi
m.rownumber = nrow(m.filtered)
# Get names of classes in OTUs
m.classes = m.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
# Get ASVs taxa
q.tax_table = data.frame(q.norm@tax_table)
# Filter out ASVs with our phylum name
q.filtered = q.tax_table %>%
rownames_to_column('ASV') %>%
filter(Phylum==phylum_name_qiime2) %>%
column_to_rownames('ASV')
# Get total number of ASVs in Chloroflexi
q.rownumber = nrow(q.filtered)
# Get names of classes in ASVs
q.classes = q.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
For Chloroflexi, the number of OTUs was found to be 34, and the number of ASVs was found to be 47. The OTUs represent classes: SAR202_clade, Anaerolineae, while the ASVs represent classes: D_2__JG30-KF-CM66, D_2__Anaerolineae, D_2__uncultured, D_2__Dehalococcoidia, D_2__SAR202 clade.
# Example for linear model
# Make data frame that holds linear model statistics data (OTUs depth)
otu_stats = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
for (otu in row.names(m.filtered)){
linear_fit = m.norm %>%
psmelt() %>%
filter(OTU==otu) %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
otu_data = linear_fit$coefficients["Depth_m",]
otu_stats <- rbind(otu_stats, otu_data)
}
colnames(otu_stats)<- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Add OTU names
row.names(otu_stats) <- row.names(m.filtered)
# Add class and genus names
otu_stats = cbind(data.frame(Class = m.filtered$Class), Genus = m.filtered$Genus, otu_stats)
# Sort data frame by linear model slope (estimate)
sorted = arrange(rownames_to_column(otu_stats),Estimate)%>% column_to_rownames(var="rowname")
# Save data table in variable
lm.depth.otus = kable(sorted,caption="Table A1 Correlation data of Chloroflexi OTUs Abundance with Depth")
# Example for correlation graph
# Correlation graph for Chloroflexi OTUs percentage abundance across depth, first filtered for Chloroflexi, then plotted
m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance)) +
geom_smooth(method='lm', aes(x=Depth_m, y=Abundance)) +
facet_wrap(~OTU, scales="free_y") +
labs(x = "Depth (m)", y = "Abundance (%)") +
theme(axis.text.x = element_text(angle = 90))
Figure 7 Relative abundances of Chloroflexi OTUs across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A1.
Figure 8 Relative abundances of Chloroflexi ASVs across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A2.
Figure 9 Relative abundances of Chloroflexi OTUs across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A3.
Figure 10 Relative abundances of Chloroflexi ASVs across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A4.
Linear model statistics were performed for the abundance of each OTU and ASV in relation to depth and oxygen concentration (Appendix A Table A1-A4). The linear models were subsequently plotted (Figures 7-10). No significant correlations were found between any individual OTUs/ASVs abundance and depth or oxygen concentration (p > 0.05 for all).
Although none of the correlations were significant, mothur and QIIME2 showed similar trends. For mothur ten of the 34 OTUs had negative correlation between abundance and depth (the rest positive), while for QIIME2 nine of the 47 ASVs had negative correlation between abundance and depth (the rest positive). This was while for abundance versus oxygen concentration, for mothur all OTUs had negative correlation, and for QIIME2 all but one ASVs had negative correlation.
Figure 11 Relative abundances of Chloroflexi OTUs across hydrogen sulfide concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. Significance of each fit can be found in Table A5.
Figure 12 Relative abundances of Chloroflexi ASVs across hydrogen sulfide concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. Significance of each fit can be found in Table A6.
Linear model statistics were performed for the abundance of each OTU and ASV in relation to hydrogen sulfide concentration (Appendix A Table A5 & A6). The linear models were subsequently plotted (Figures 11 & 12). Significant positive correlations between abundance and hydrogen sulfide concentration after p-adjusting were found for 15 and 19 individual OTUs and ASVs, respectively. For OTUs and ASVs, p-values were <0.05. Classes associated with OTUs and ASVs positively correlating with hydrogen sulfide concentration include Anaerolineae, Dehalococcoidia and the SAR202 clade.
QIIME2 identified 9 more individual members of Chloroflexi that significantly and positively correlate with hydrogen sulfide concentration compared to mothur. In addition, the significance of the results and the positive correlation with hydrogen sulfide concentration were generally greater with QIIME2 than mothur.
The Saanich Inlet provides a good model for the study of oxygen minimum zones and the microbial dynamics that shape them. The microbial diversity analyses we carried out with the two bioinformatic pipelines mothur and QIIME2 resulted in mostly similar patterns, but there were differences when it came to the details. Both pipelines found that the peak Shannon diversity index was at 100 m, however, the peak Chao1 richness estimates for the total community was at a 100 m for mothur and 120 m for QIIME2. We see the same discrepancy between mothur and QIIME2 and their Chao1 richness estimates when looking at alpha-diversity across dissolved oxygen concentrations (Figure 2). As for discrepancies between Shannon diversity and Chao1 richness, we found that at the dissolved oxygen concentration of 217 μm Chao1 was relatively lower than Shannon for both mothur and QIIME2. The lower value of Chao1 compared to Shannon at ~217uM could indicate increased species evenness despite reduced species diversity because Chao1 does not take into account evenness while Shannon does. The two pipelines also agreed that Saanich Inlet is dominated by only 4-5 phyla, with the phylum Chloroflexi making up 0-6% of the microbial community depending on the sample depth.
The phylum Chloroflexi contains bacteria with different metabolic characteristics, such as aerobic thermophiles, anoxygenic phototrophs, and anaerobic halorespirers which use halogenated organics as electron acceptors [10, 11]. In our analysis, we found that Chloroflexi abundance was positively correlated with depth (Table 1). This may be a result of oxygen concentrations decreasing with depth: a negative correlation between Chloroflexi abundance and oxygen concentrations was also found (Table 2). Interestingly, these results were determined to only be significant in the QIIME2 analysis, but not the mothur analysis. This discrepancy is discussed later and points out that the analysis method, in this case, plays a role in whether our analysis results are significant or not. The results indicate a preference for members of Chloroflexi to inhabit anoxic habitats at depth within Saanich Inlet, which is supported by previously mentioned research on this phylum [11, 16, 21]. Since Chloroflexi encompasses such vastly different classes of bacteria with disparate metabolistic behavior, it is important to note that the diverse classes within this phylum have different requirements for oxygen content, where some of them also thrive in oxic environments [11, 16, 21]. This may explain why the abundance vs. depth and oxygen concentration results with mothur were not significant (Table 1 & 2). However, we only identified a few classes within Chloroflex from our data, and each of those classes appears to be anaerobic [14-18], so the potential outliers might be a result of something else, such as other nutrients. Anoxic oceanic zones such as the deep waters of Saanich Inlet may provide an environment in which anaerobic members of Chloroflexi can be more competitive than other phyla and dominate the microbial population.
The richness within Chloroflexi highly depends on which bioinformatic pipeline is used for the analysis. Mothur was only able to identify 34 OTUs occupying 2 classes within Chloroflexi, while QIIME2 was able to identify 47 ASVs and 5 classes within Chloroflexi. However, QIIME2 was unable to identify any genera while mothur was (Table A1-3). Therefore, the required depth into the taxonomic tree may dictate which pipeline should be used in future analyses. Correlations between individual OTUs and ASVs within Chloroflexi against depth or oxygen concentration were found to be insignificant (Appendix A Table 1-4). Linear models in Figures 7-10 show similar trends between the individual OTUs and ASV with depth and oxygen concentration, however there are glaring single outlier defying any correlation between the data points. These outliers are present in all analyses but do not occur at the same depths or oxygen levels for the various ASVs and OTUs studied. This may mean that there is no pattern or rationale for their occurrences. More data would allow to either legitimize these data points or confirm them as outliers to either integrate them or exclude them from the results.
Since members of the Chloroflexi phylum have been associated with sulfur cycling, the hydrogen sulfide levels in Saanich Inlet were investigated in relation to the abundance of members of this phylum [19]. Strong positive and significant correlations were found between various members of the Chloroflexi phylum and hydrogen sulfide concentrations. Classes correlating with hydrogen sulfide levels include Anaerolineae, Dehalococcoidia and the SAR202 clade (Figure 11 & 12, Appendix A Table 5 & 6). These results are supported by previous studies that have identified members of SAR202 cluster belonging to the Chloroflexi phylum to play a major role in the sulfur cycle in the dark water column. Some of these members have pathways for sulfur reduction and could be responsible for the hydrogen sulfide concentrations at depth. As previously stated, they also have the potential to metabolize a variety of organosulfur compounds [19]. In addition, single-cell genomics studies of members of the Dehalococcoidia class within the Chloroflexi phylum highlighted their association with marine sediments and sulfur cycling [13].
Differences in bioinformatic pipelines for microbial ecology data analysis may result in potential differences in analytical outcomes, and can lead to misidentifying species in different habitats or incorrectly determining trends. As we observed in this study, although mothur and QIIME2 often produced similar patterns, they did not agree on details. While QIIME2 led us to conclude that there was a presence of four classes (plus one uncultured class) of Chloroflexi in our samples, mothur only identified two. These differences are concerning, since depending on the pipeline we use, we might miss organisms we have collected, or we might identify organisms that are in reality not there. Consequently, we can be drawing the wrong conclusions about ecosystems, and the interplay of its inhabitants. Furthermore, whether we find significant correlations can also depend on the pipeline used. While Chloroflexi abundance was found to significantly correlate with depth and oxygen concentration for QIIME2, the correlation was not significant for mothur. In fact, the significances differed by an order of magnitude. This emphasizes the concern that significance in analysis findings may rest upon the usage of different pipelines and clustering paradigms, leading to false-positives and false-negatives. This also highlights the importance of developing objective metrics to gauge the accuracy of both clustering paradigms.
Differences we observed between the usage of mothur versus QIIME2 to analyse the reads in this study can also be seen in other research. For instance, a study examining the composition of chicken cecum microbiome performed by Allalil et al. revealed lower phylogenetic diversity (PD) values when UPARSE pipelines were used in comparison to de novo QIIME pipelines and open reference QIIME pipelines [25]. However, Species Richness (S) values were comparable when comparing different pipelines. In addition, the number of assigned sequences for different sequencing platform runs were impacted because of OTU picking using different pipelines: De novo vs. open reference QIIME pipelines. The QIIME pipelines generated different relative abundance of specific genera in comparison to UPARSE. Moreover, differences in the detection profiles, such as the number of unique species, were observed when using different pipelines. The number of OTUs and taxonomic assignments produced and identified differed between pipelines with a 99% similarity threshold.
Since multiple classes were identified under the order Chloroflexi and variations in the directions of correlation with depth were observed for several clusters, future analyses may find more meaning at a sub-phylum level. Additionally, more data were obtained than were analyzed in the current report. In the future, one could look for correlations of abundance across the other factors, such as temperature or salinity, not just depth, oxygen concentration, and hydrogen sulfide. Furthermore, there is a gap in the data between the depth of 10m and depth of 100m, which makes it difficult to determine correlations. More consistent data collection with more samples and at more regular intervals could help alleviate such problems, and potentially show significant correlations. Analysis of data available from the collection of samples over time could be interesting in exploring how the diversity in the area changes over seasons, or over longer time periods, such as decades. It could also be of interest, for any unknown, or not very well known, organisms to look into more details of their genetic make up in order to determine what roles, if any, they might play in biogeochemical cycles.
[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.
[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.
[3] 2012. Saanich Inlet. MicrobeWiki.
[4] Oxygen Minimum Zones. Keil Lab: Aquatic Organic Geochemistry, UW Oceanography.
[5] 2014. OMZ Microbes - A SCOR working group.
[6] Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E. 2005. Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society B: Biological Sciences 360: 1935â“1943.
[7] Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. Multidisciplinary Journal of Microbial Ecology 11: 2639â“2643.
[8] Schmidt TSB, Rodrigues JFM, Christian M. 2014. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLoS Comput Biol. 10.
[9] Wang, Y., Sheng, H., He, Y., Wu, J., Jiang, Y., Tam, N. F., & Zhou, H. 2012. Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Applied and Environmental Microbiology 78: 8264-8271. 10.1128/AEM.01821-12
[10] Thiel V, Hamilton TL, Tomsho LP, Burhans R, Gay SE, Schuster SC, et al. 2014. Draft genome sequence of a sulfide-oxidizing, autotrophic filamentous anoxygenic phototrophic bacterium, Chloroflexus sp. strain MS-G (Chloroflexi). Genome Announc 2: 9â“10.
[11] Sekiguchi Y, Yamada T, Hanada S, Ohashi A, Harada H, Kamagata Y. 2003. Anaerolinea thermophila gen. nov., sp. nov. and Caldilinea aerophila gen. nov., sp. nov., novel filamentous thermophiles that represent a previously uncultured lineage of the domain bacteria at the subphylum level. Int J Syst Evol Microbiol 53: 1843â“51.
[12] Kiss H, Nett M, Domin N, Martin K, Maresca JA, Copeland A, Lapidus A, Lucas S, Berry KW, Rio TGD, Dalin E, Tice H, Pitluck S, Richardson P, Bruce D, Goodwin L, Han C, Detter JC, Schmutz J, Brettin T, Land M, Hauser L, Kyrpides NC, Ivanova N, Göker M, Woyke T, Klenk H-P, Bryant DA. 2011. Complete genome sequence of the filamentous gliding predatory bacterium Herpetosiphon aurantiacus type strain (114-95T). Standards in Genomic Sciences 5: 356â“370.
[13] Wasmund K, Cooper M, Schreiber L, Lloyd KG, Baker BJ, Petersen DG, Jørgensen BB, Stepanauskas R, Reinhardt R, Schramm A, Loy A, Adrian L. 2016. Single-Cell Genome and Group-SpecificdsrABSequencing Implicate Marine Members of the ClassDehalococcoidia(PhylumChloroflexi) in Sulfur Cycling. mBio 7.
[14] Biderre-Petit C, Dugat-Bony E, Mege M, Parisot N, Adrian L, Moné A, Denonfoux J, Peyretaillade E, Debroas D, Boucher D, Peyret P. 2016. Distribution of Dehalococcoidia in the Anaerobic Deep Water of a Remote Meromictic Crater Lake and Detection of Dehalococcoidia-Derived Reductive Dehalogenase Homologous Genes. Plos One 11.
[15] Hugenholtz, P., Goebel, B. M., & Pace, N. R. 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. Journal of Bacteriology 18: 4765-4774.
[16] Xia Y, Wang Y, Wang Y, Chin FYL, Zhang T. 2016. Cellular adhesiveness and cellulolytic capacity in Anaerolineae revealed by omics-based genome interpretation. Biotechnology for Biofuels 9.
[17] Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. 1996. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proceedings of the National Academy of Sciences 93:7979â“7984.
[18] Morris RM, Rappé MS, Urbach E, Connon SA, Rappe MS, Giovannoni SJ. 2004. Prevalence of the Chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and deep ocean. Appl Environ Microbiol 70: 2836â“42.
[19] Mehrshad M, Rodriguez-Valera F, Amoozegar MA, López-GarcÃa P, Ghai R. 2017. The enigmatic SAR202 cluster up close: shedding light on a globally distributed dark ocean lineage involved in sulfur cycling. The ISME Journal 12: 655â“668.
[20] Wegner C-E, Liesack W. 2017. Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology 8.
[21] Ye Q, Wu Y, Zhu Z, Wang X, Li Z, Zhang J. 2016. Bacterial diversity in the surface sediments of the hypoxic zone near the Changjiang Estuary and in the East China Sea. MicrobiologyOpen 5: 323â“339.
[22] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4: 170159.
[23] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, del Rio TG, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4: 170160.
[24] R Core Team. 2017. R: A language and environment for statistical computing. 3.4.3. R Foundation for Statistical Computing, Vienna, Austria.
[25] Allali I, Arnold J.W., Roach J, Cadenas M.B., Butz N, Hassan H.M., Koci M, Ballou A, Mendoza M, Ali R, Azcarate-Peril M.A. 2017. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiology 17: 1-16.
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.1713584 | 0.3794814 | -0.4515595 | 0.6704985 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0073322 | 0.0162544 | -0.4510917 | 0.6708136 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0027496 | 0.0165362 | -0.1662767 | 0.8744539 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0023568 | 0.0055057 | -0.4280621 | 0.6864177 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0011784 | 0.0027529 | -0.4280621 | 0.6864177 |
| Otu2381 | Anaerolineae | uncultured | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0002619 | 0.0028004 | -0.0935100 | 0.9291298 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | 0.0001637 | 0.0036177 | 0.0452401 | 0.9656672 |
| Otu3712 | Anaerolineae | uncultured | 0.0008511 | 0.0055928 | 0.1521723 | 0.8850009 |
| Otu3607 | Anaerolineae | uncultured | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0036007 | 0.0053694 | 0.6705821 | 0.5322101 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0070049 | 0.0049223 | 1.4231039 | 0.2139907 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0079214 | 0.0046360 | 1.7086714 | 0.1482107 |
| Otu3589 | Anaerolineae | uncultured | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu2497 | Anaerolineae | Pelolinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.0146645 | 0.0110438 | 1.3278449 | 0.2416173 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.0158101 | 0.0123144 | 1.2838745 | 0.2554599 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.0158429 | 0.0092720 | 1.7086714 | 0.1482107 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.0170213 | 0.0117667 | 1.4465667 | 0.2076595 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Otu0551 | Anaerolineae | Thermomarinilinea | 0.0365957 | 0.0465283 | 0.7865264 | 0.4671821 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.0374468 | 0.0258867 | 1.4465667 | 0.2076595 |
| Otu0607 | Anaerolineae | Thermomarinilinea | 0.0389853 | 0.0438394 | 0.8892756 | 0.4145865 |
| Otu0799 | Anaerolineae | Pelolinea | 0.0477578 | 0.0280660 | 1.7016226 | 0.1495636 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | 0.1946645 | 0.2102439 | 0.9258985 | 0.3969899 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 0.4527660 | 0.3129935 | 1.4465667 | 0.2076595 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 0.5344681 | 0.3694735 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1886 | D_2__SAR202 clade | D_5__ | -0.3397709 | 0.2301978 | -1.4759954 | 0.1999714 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.1327332 | 0.3683890 | -0.3603073 | 0.7333378 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0329951 | 0.0770801 | -0.4280621 | 0.6864177 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0164975 | 0.0385401 | -0.4280621 | 0.6864177 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0057610 | 0.0616089 | -0.0935100 | 0.9291298 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0039280 | 0.1354082 | -0.0290085 | 0.9779801 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0034043 | 0.0364052 | -0.0935100 | 0.9291298 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0007856 | 0.0084012 | -0.0935100 | 0.9291298 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | 0.0011129 | 0.0027583 | 0.4034830 | 0.7032691 |
| Asv2034 | D_2__SAR202 clade | D_5__ | 0.0038298 | 0.0251674 | 0.1521723 | 0.8850009 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | 0.0057610 | 0.0801057 | 0.0719180 | 0.9454553 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | 0.0111293 | 0.0275831 | 0.4034830 | 0.7032691 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0126023 | 0.0187931 | 0.6705821 | 0.5322101 |
| Asv400 | D_2__uncultured | D_5__ | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv2063 | D_2__Anaerolineae | NA | 0.0189198 | 0.0468912 | 0.4034830 | 0.7032691 |
| Asv134 | D_2__Anaerolineae | NA | 0.0234043 | 0.0349014 | 0.6705821 | 0.5322101 |
| Asv1473 | D_2__SAR202 clade | NA | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.0272340 | 0.0188267 | 1.4465667 | 0.2076595 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0288052 | 0.0429556 | 0.6705821 | 0.5322101 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | 0.0396072 | 0.0859823 | 0.4606434 | 0.6643958 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0414075 | 0.0617486 | 0.6705821 | 0.5322101 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | 0.0418331 | 0.0911012 | 0.4591934 | 0.6653680 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.0476596 | 0.0329467 | 1.4465667 | 0.2076595 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0522095 | 0.0778570 | 0.6705821 | 0.5322101 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.0578723 | 0.0400067 | 1.4465667 | 0.2076595 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | 0.0583633 | 0.0676342 | 0.8629259 | 0.4276204 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 0.0748936 | 0.0517734 | 1.4465667 | 0.2076595 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0792144 | 0.1181278 | 0.6705821 | 0.5322101 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | 0.0955810 | 0.1216822 | 0.7854968 | 0.4677332 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | 0.1054664 | 0.1601592 | 0.6585100 | 0.5393201 |
| Asv2324 | D_2__Anaerolineae | NA | 0.1089362 | 0.0753067 | 1.4465667 | 0.2076595 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 0.1361702 | 0.0941334 | 1.4465667 | 0.2076595 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | 0.1468412 | 0.1522869 | 0.9642409 | 0.3792105 |
| Asv1095 | D_2__Anaerolineae | NA | 0.1634043 | 0.1129601 | 1.4465667 | 0.2076595 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 0.1668085 | 0.1153134 | 1.4465667 | 0.2076595 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | 0.1669722 | 0.1917355 | 0.8708462 | 0.4236697 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | 0.1859247 | 0.2109964 | 0.8811750 | 0.4185601 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 0.3438298 | 0.2376868 | 1.4465667 | 0.2076595 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 0.5208511 | 0.3600602 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | -0.1897296 | 0.3302310 | -0.5745358 | 0.5904867 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.1736086 | 0.1582994 | -1.0967102 | 0.3227568 |
| Otu0215 | Anaerolineae | Thermomarinilinea | -0.1607263 | 0.2797499 | -0.5745358 | 0.5904867 |
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.0520091 | 0.2990620 | -0.1739074 | 0.8687601 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.0317385 | 0.0336870 | -0.9421584 | 0.3893700 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0269047 | 0.0362727 | -0.7417334 | 0.4915977 |
| Otu0799 | Anaerolineae | Pelolinea | -0.0200649 | 0.0258114 | -0.7773675 | 0.4721013 |
| Otu1028 | Anaerolineae | Thermomarinilinea | -0.0132932 | 0.0231372 | -0.5745358 | 0.5904867 |
| Otu0662 | Anaerolineae | Thermomarinilinea | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Otu1983 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0103315 | -0.8187858 | 0.4501563 |
| Otu1147 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0092049 | -0.9189965 | 0.4002601 |
| Otu1246 | Anaerolineae | Thermomarinilinea | -0.0072508 | 0.0084400 | -0.8590967 | 0.4295406 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | -0.0060423 | 0.0105169 | -0.5745358 | 0.5904867 |
| Otu1419 | Anaerolineae | Thermomarinilinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu2497 | Anaerolineae | Pelolinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu3589 | Anaerolineae | uncultured | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu1863 | Anaerolineae | Pelolinea | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu2789 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu3623 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0020728 | 0.0128145 | -0.1617557 | 0.8778315 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0012945 | 0.0128349 | -0.1008584 | 0.9235824 |
| Otu3712 | Anaerolineae | uncultured | -0.0012919 | 0.0043048 | -0.3001126 | 0.7761680 |
| Otu2790 | Anaerolineae | Thermomarinilinea | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu3607 | Anaerolineae | uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0009643 | 0.0027703 | -0.3480925 | 0.7419469 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu2381 | Anaerolineae | uncultured | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0003254 | 0.0043410 | -0.0749568 | 0.9431557 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0003184 | 0.0021670 | -0.1469078 | 0.8889445 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0001627 | 0.0021705 | -0.0749568 | 0.9431557 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | -0.1848957 | 0.3218175 | -0.5745358 | 0.5904867 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -0.1764060 | 0.1570152 | -1.1234959 | 0.3122557 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -0.1646951 | 0.1413959 | -1.1647802 | 0.2966535 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -0.1439410 | 0.1112113 | -1.2943021 | 0.2521126 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | -0.1220553 | 0.2124416 | -0.5745358 | 0.5904867 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -0.1061416 | 0.0879361 | -1.2070308 | 0.2814027 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.0969221 | 0.1218861 | -0.7951862 | 0.4625657 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.0875482 | 0.2864534 | -0.3056281 | 0.7722028 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | -0.0592150 | 0.1030657 | -0.5745358 | 0.5904867 |
| Asv1095 | D_2__Anaerolineae | NA | -0.0580065 | 0.1009624 | -0.5745358 | 0.5904867 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | -0.0531726 | 0.0925488 | -0.5745358 | 0.5904867 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | -0.0483387 | 0.0841353 | -0.5745358 | 0.5904867 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.0476310 | 0.0688397 | -0.6919125 | 0.5198017 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.0452141 | 0.0649448 | -0.6961928 | 0.5173357 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0447133 | 0.0524914 | -0.8518223 | 0.4332067 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv2324 | D_2__Anaerolineae | NA | -0.0386710 | 0.0673082 | -0.5745358 | 0.5904867 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | -0.0350456 | 0.0609981 | -0.5745358 | 0.5904867 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | -0.0277948 | 0.0483778 | -0.5745358 | 0.5904867 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | -0.0265863 | 0.0462744 | -0.5745358 | 0.5904867 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0252671 | 0.1043155 | -0.2422180 | 0.8182327 |
| Asv2063 | D_2__Anaerolineae | NA | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | -0.0193355 | 0.0336541 | -0.5745358 | 0.5904867 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | -0.0169186 | 0.0294474 | -0.5745358 | 0.5904867 |
| Asv134 | D_2__Anaerolineae | NA | -0.0157101 | 0.0273440 | -0.5745358 | 0.5904867 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.0112835 | 0.0618942 | -0.1823031 | 0.8625058 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1003 | D_2__SAR202 clade | D_5__ | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1234 | D_2__SAR202 clade | D_5__ | -0.0096677 | 0.0168271 | -0.5745358 | 0.5904867 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1473 | D_2__SAR202 clade | NA | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0070038 | 0.0476747 | -0.1469078 | 0.8889445 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.0058137 | 0.0193716 | -0.3001126 | 0.7761680 |
| Asv496 | D_2__Dehalococcoidia | NA | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv400 | D_2__uncultured | D_5__ | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0045554 | 0.0607736 | -0.0749568 | 0.9431557 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0041386 | 0.0281714 | -0.1469078 | 0.8889445 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0022777 | 0.0303868 | -0.0749568 | 0.9431557 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0009551 | 0.0065011 | -0.1469078 | 0.8889445 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Asv1886 | D_2__SAR202 clade | D_5__ | 0.1614351 | 0.2011515 | 0.8025547 | 0.4586642 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | FDR Adjusted p-value | |
|---|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -2.4575728 | 3.3035421 | -0.7439205 | 0.4903847 | 0.7285229 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.9242423 | 2.0042270 | -0.4611465 | 0.6640586 | 0.7285229 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.1124721 | 0.4212890 | -0.2669714 | 0.8001524 | 0.8501620 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 | 0.7285229 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 | 0.7285229 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0280166 | 0.4433827 | -0.0631883 | 0.9520649 | 0.9520649 |
| Otu2381 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu3712 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0309087 | -0.7362104 | 0.4946701 | 0.7285229 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0032080 | 0.0503917 | 0.0636606 | 0.9517072 | 0.9520649 |
| Otu3607 | Anaerolineae | uncultured | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0584922 | 0.0454851 | 1.2859653 | 0.2547855 | 0.5414192 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0991909 | 0.0280249 | 3.5393861 | 0.0165734 | 0.0375663 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.1185885 | 0.1161660 | 1.0208533 | 0.3541507 | 0.6689512 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.1203422 | 0.1022047 | 1.1774631 | 0.2920002 | 0.5840003 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.1983818 | 0.0560498 | 3.5393861 | 0.0165734 | 0.0375663 |
| Otu3589 | Anaerolineae | uncultured | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu2497 | Anaerolineae | Pelolinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.2764214 | 0.0245330 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.6081271 | 0.0539726 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0799 | Anaerolineae | Pelolinea | 0.6618074 | 0.1140109 | 5.8047714 | 0.0021396 | 0.0055959 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 7.3528091 | 0.6525773 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 8.6796317 | 0.7703356 | 11.2673382 | 0.0000962 | 0.0002726 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | FDR Adjusted p-value | |
|---|---|---|---|---|---|---|---|
| Asv800 | D_2__SAR202 clade | D_5__ | -2.9695672 | 3.0816822 | -0.9636189 | 0.3794937 | 0.8004961 |
| Asv1886 | D_2__SAR202 clade | D_5__ | -2.2758300 | 2.2620814 | -1.0060778 | 0.3605552 | 0.8004961 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -1.2574021 | 1.7629163 | -0.7132512 | 0.5075876 | 0.8004961 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -1.1503411 | 1.9735619 | -0.5828756 | 0.5852742 | 0.8004961 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -1.1127006 | 1.4059595 | -0.7914172 | 0.4645707 | 0.8004961 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -1.0569670 | 1.0591512 | -0.9979378 | 0.3641245 | 0.8004961 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.6485262 | 1.1827890 | -0.5483025 | 0.6070659 | 0.8004961 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.6042138 | 1.4769566 | -0.4090938 | 0.6994049 | 0.8218007 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.5119943 | 0.8044173 | -0.6364785 | 0.5524571 | 0.8004961 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.4892390 | 0.7585529 | -0.6449637 | 0.5473730 | 0.8004961 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.3868402 | 0.6996938 | -0.5528707 | 0.6041591 | 0.8004961 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.3185743 | 0.6912399 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.2503083 | 0.5431170 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2063 | D_2__Anaerolineae | NA | -0.1934201 | 0.4196813 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.1592871 | 0.3456199 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.1479095 | 0.3209328 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.1137765 | 0.2468714 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.1023989 | 0.2221842 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0964323 | 0.6505274 | -0.1482371 | 0.8879484 | 0.9517072 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0112279 | 0.1763709 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv134 | D_2__Anaerolineae | NA | 0.0208518 | 0.3275459 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0256637 | 0.4031335 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0368916 | 0.5795044 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0465155 | 0.7306794 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0705752 | 1.1086170 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv400 | D_2__uncultured | D_5__ | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1473 | D_2__SAR202 clade | NA | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.4422742 | 0.0392528 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.7739799 | 0.0686923 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.9398327 | 0.0834121 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 1.2162541 | 0.1079451 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv2324 | D_2__Anaerolineae | NA | 1.7690969 | 0.1570111 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 2.2113711 | 0.1962638 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1095 | D_2__Anaerolineae | NA | 2.6536454 | 0.2355166 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 2.7089297 | 0.2404232 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 5.5837121 | 0.4955662 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 8.4584946 | 0.7507092 | 11.2673382 | 0.0000962 | 0.0002380 |
Analysis of DNA and RNA sequences obtained from Saanich Inlet using high-throughput sequencing has allowed us to examine the nitrogen cycle in this area. This study serves as a great model to examine microbial community responses to changes in the environment such as environmental oxygen levels. Although various genes encode enzymes that are crucial to the nitrogen cycle, our study focuses on the abundance of the norC gene, which encodes the nitric oxide reductase subunit C, at various depths. Using the TreeSAPP pipeline it was shown that at all but one depth, norC was more abundant at the genome level than at the expression level. Any classified families that contained norC in their genome were also found to express it, although the ratios between DNA and RNA abundance differed between the families. Gammaproteobacteria contributed the most to norC expression levels at all but one depth, and represented the class with the most norC at the genome level as well. A notable exception for expression levels were Epsilonproteobacteria, which were found to express the most norC at depth 200m. NorC was found to correlate significantly with only one nitrogen species, NH4+, with which it had a positive correlation. Relations with all the other nitrogen species were negative and insignificant. Although our study focused on one specific gene, the methods could be used to examine the abundance of other genes either involved in the nitrogen cycle or other biogeochemical processes, as well as to gain insight into how microbial communities respond to changes in their surrounding environment.
Saanich Inlet is a seasonally anoxic fjord located between Vancouver Island and the Saanich Peninsula [1]. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen replenishment is dependent on the season, occurring mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabits the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at greater depths up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is essential to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the worldâs oceans [3].
Oxygen minimum zones (OMZ) are areas of the ocean typically occurring between depths of about 200 to 1000 meters where oxygen concentrations are at their lowest [4]. Global OMZs are normally found along the western boundaries of continents where upwelling brings nutrient rich waters from the deep ocean to surface waters. This influx of nutrients increases primary production and therefore the input of organic particle and respiration rates at depth [5]. OMZs are also found in coastal basins where restricted circulation decreases the mixing of deep and surface waters [5]. Saanich Inlet is a glacially carved fjord and is an example of a coastal basin in which the water circulation is decreased by an entrance sill. This in turn restricts oxygen rich water from entering the basin and creates a seasonally anoxic zone at depth [5].
A reaction performed by microbes through multispecies microbial interaction is the conversion of the inert elemental nitrogen gas (N2) to a usable form that can be used by plants for nucleic acid and protein synthesis [6]. The reductive and irreversible process of converting elemental N2 to NH4+ is catalyzed by nitrogenase - a conserved enzyme complex that is inhibited in the presence of oxygen [6]. NH4+ can be oxidized to nitrate (NO3â) in the presence of oxygen through a two-step process. The first step is the oxidation of ammonia (NH4+) to nitrite (NO2â) by a particular group of Bacteria or Archaea and the second step involves oxidation of NO2â to NO3â by a different group of nitrifying microbes [6]. Finally, opportunistic microbes use NO2â and NO3â as electron acceptors in the absence of oxygen for the oxidation of organic matter. This process ultimately leads to the formation of N2 and completion of the nitrogen cycle [6]. The latter process is called denitrification and consist of the dissimilatory reduction of NO2â and NO3â by denitrifying facultative anaerobes to nitric oxide (NO) and nitrous oxide (N2O). These two nitrogen species are classified as ionic nitrogen oxides and act as terminal electron acceptors in anaerobic conditions where they are reduced to dinitrogen (N2) [7]. Four functional enzymes are involved in the process of denitrification: nitrate, nitrite, nitric oxide, and nitrous oxide reductases that are encoded by nar, nir, nor, and nis gene clusters, respectively [8].
It is important to note that denitrification increases in oxygen minimum zones (OMZs) in the ocean. The lack of oxygen results in heterotrophic microbes using other oxidants including nitrate and nitrite. This leads to loss of nitrogen from the ocean in the form of N2 [9]. Denitrification is an undesirable process for soil fertility and agricultural productivity as it results in loss of fertilizer nitrogen (nitrate) from soil environments [7]. However, it plays a major role in removal of nitrogen from waste, such as animal residues. Denitrification is of major ecological importance since it is responsible for the supply of NO2â to the atmosphere and causes stratospheric reactions leading to the depletion of ozone [7]. The absence of this process would result in the accumulation of N2 in the atmosphere and NO3â, a toxin if found at high concentrations, in water and soil environments. In conclusion, the absence of denitrification would result in disruption of the nitrogen cycle [7].
The microbial species involved in denitrification are able to use nitrogen oxides as electron acceptors in place of oxygen under anaerobic conditions. One group of denitrifying bacteria is photosynthetic; however, most are heterotrophs and some are autotrophs that utilize reduced sulphur compounds or H2 and CO2 [7]. Important denitrifiers isolated from soil and aquatic environments include members of the Achromobacter, Alcaligenes, Bacillus, Chromobacterium, Chromobacterium, Halobacterium, Hyphomicribium, Moraxella, Paracoccus, and Pseudomonas genera [7]. Altogether, the nitrogen cycle could be more accurately represented as a complex metabolic network [10], and the Saanich Inlet is a model ecosystem for studying how the components of that network are distributed across both taxa and depth [3]. We focused our investigations on norC, which is the component of the denitrification pathway responsible for the reduction of NO to N2O.
Water samples were collected from 16 depths (10-200m) at station S3 in Saanich Inlet (48°35.500 N, 123°30.300 W) onboard MSV John Strickland during cruise 72 (August 1, 2012). Water samples (2L) were filtered through a 0.22 µm Sterivex filter to collect biomass, which was stored at -80ËC for further multi-omic analyses. Geochemical data was measured in situ using a Sea-Bird SBE 43 and in the laboratory on water samples collected using various assays [3, 11].
Total genomic DNA and RNA was extracted from Sterivex filters for 7 depths (10, 100, 120, 135, 150, 165, 200m). Illumina metagenomic shotgun libraries were constructed from reversed transcribed genomic RNA (cDNA) and genomic DNA, and were paired end sequenced (2x150bp technology) on the Illumina HiSeq platform at the US Department of Energy Joint Genome Institute (DOE JGI). Processing and quality control of the output reads were also done at JGI using the IMG/M pipeline, and the assembly and processing of the metagenomes were done at the University of British Columbia using MetaPathways 2.5 [11].
In-depth sampling and sequencing methods can be found here: Hawley AK et al 2017, âA compendium of multi-omic sequence information from the Saanich Inlet water columnâ Sci Data 4: 170160.
Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was used through Google Cloud services to reconstruct the nitrogen cycle in Saanich Inlet along defined redox gradients. Computational servers were linked to Google Cloud storage using the gcsfuse command. To allow for continuous processing of the data while offline the ânohupâ command was used. DNA and RNA assemblies were processed for norC (D0502) through the TreeSAPP pipeline and SAM files were immediately removed using the following code:
# Code for NorC processing of the RNA and DNA samples
rm -rf treesapp_out
for 10m, 100m, 120m, 135m, 150m, 165m and 200m depths:
time treesapp.py -T 8 --verbose --delete -t D0502 --rpkm \
-i bucket/MetaG_assemblies/SI072_LV_"$depth"_DNA.scaffolds.fasta \
-r bucket/MetaG_reads/SI072_LV_"$depth".anqdp.fastq.gz \
-o treesapp_out/dna/"$depth"
rm treesapp_out/dna/"$depth"/RPKM_outputs/*.sam
time treesapp.py -T 8 --verbose --delete -t D0502 --rpkm \
-i bucket/MetaG_assemblies/SI072_LV_"$depth"_DNA.scaffolds.fasta \
-r bucket/MetaT_reads/SI072_LV_"$depth".qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp_out/rna/"$depth"
rm treesapp_out/rna/"$depth"/RPKM_outputs/*.sam
TreeSAPP output data found under treesapp_out/iTOL_output/ were visualized in a phylogenetic tree using Interactive Tree of Life (iTOL) software 4.2 v1.0 [12].
Analysis was completed in R v3.4.3 [13] using the following packages.
library(tidyverse)
library(cowplot)
library(phyloseq)
library(grid)
library(knitr)
Taxonomy, abundance, and query data were loaded from “marker_contig_map.tsv” files obtained from TreeSAPP.
treesapp.out.dir = "./treesapp_out_gene_tax_abd/"
# Example data loading
norC.DNA.10m = read_tsv(paste(treesapp.out.dir, "dna/10m/final_outputs/marker_contig_map.tsv", sep="")) %>%
select(Tax.DNA.10 = Confident_Taxonomy, Abund.DNA.10 = Abundance, Query)
The data were then combined into a single data frame.
# Combine the data frames
norC.all = norC.DNA.10m %>%
full_join(norC.DNA.100m, by = "Query") %>%
full_join(norC.DNA.120m, by = "Query") %>%
full_join(norC.DNA.135m, by = "Query") %>%
full_join(norC.DNA.150m, by = "Query") %>%
full_join(norC.DNA.165m, by = "Query") %>%
full_join(norC.DNA.200m, by = "Query") %>%
full_join(norC.RNA.10m, by = "Query") %>%
full_join(norC.RNA.100m, by = "Query") %>%
full_join(norC.RNA.120m, by = "Query") %>%
full_join(norC.RNA.135m, by = "Query") %>%
full_join(norC.RNA.150m, by = "Query") %>%
full_join(norC.RNA.165m, by = "Query") %>%
full_join(norC.RNA.200m, by = "Query") %>%
# Create a taxonomy variable aggregating all taxonomy columns to fill in any NAs
mutate(Taxonomy = ifelse(!is.na(Tax.DNA.10), Tax.DNA.10,
ifelse(!is.na(Tax.DNA.100), Tax.DNA.100,
ifelse(!is.na(Tax.DNA.120), Tax.DNA.120,
ifelse(!is.na(Tax.DNA.135), Tax.DNA.135,
ifelse(!is.na(Tax.DNA.150), Tax.DNA.150,
ifelse(!is.na(Tax.DNA.165), Tax.DNA.165,
ifelse(!is.na(Tax.DNA.200), Tax.DNA.200,
ifelse(!is.na(Tax.RNA.10), Tax.RNA.10,
ifelse(!is.na(Tax.RNA.100), Tax.RNA.100,
ifelse(!is.na(Tax.RNA.120), Tax.RNA.120,
ifelse(!is.na(Tax.RNA.135), Tax.RNA.135,
ifelse(!is.na(Tax.RNA.150), Tax.RNA.150,
ifelse(!is.na(Tax.RNA.165), Tax.RNA.165,
ifelse(!is.na(Tax.RNA.200), Tax.RNA.200,
"unclassified"))))))))))))))) %>%
# Remove old Tax variables
select(-starts_with("Tax.")) %>%
# Gather all the abundance data into 1 column
gather("Key", "Abundance", starts_with("Abund")) %>%
# Turn the Key into Depth and RNA/DNA variables
separate(Key, c("Key","Type","Depth_m"), by = ".") %>%
# Remove Key variable and reorder the columns with Query at the end
select(Depth_m, Type, Abundance, Taxonomy, Query) %>%
# Make depth numerical
mutate(Depth_m = as.numeric(Depth_m)) %>%
# Separate Taxonomy into columns
separate(Taxonomy, into = c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species"), sep="; ")
Note that not all queries could be classified at the species level, so those cells were filled in with NA.
Geochemical data from cruise 72 (Aug 1, 2012) were loaded in order to draw correlations with norC abundance.
load("mothur_phyloseq.RData")
metadata = data.frame(mothur@sam_data)
The variation in the abundance of norC DNA and RNA with depth was examined by plotting the abundances at the different depths. Both abundances were then stratified at the family and class level as well. The presence of norC RNA was then examined in relation to N2O, NH4+, NO2-, NO3-, and O2 at the different depths. The abundances were also correlated with nitrogen species N2O, NH4+, NO2-, NO3- and O2 using a linear model using the lm() function in the stats R packages as per code found in the results section [14].
# Save gene abundance data
gene_abd_depth = norC.all %>%
# Group data by depth and type (RNA/DNA)
group_by(Depth_m, Type) %>%
# Filter out NAs
filter(!is.na(Abundance)) %>%
# Sum up the abundance data for each depth for RNA and DNA
summarize(Abundance = sum(Abundance)) %>%
# Round the sums
mutate(Abundance = round(Abundance, digits = 2))
# Plot the abundance data
ggplot(gene_abd_depth, aes(x = Type, y = Depth_m)) +
geom_point(aes(size = Abundance)) +
geom_text(aes(label = Abundance, hjust = -0.5)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = NULL) +
theme(legend.position="none")
Figure 1 RPKM abundance (value labels) of norC at the genome and expression level across different depths for samples obtained at Saanich Inlet.
The abundance of norC differs with depth (Figure 1). No norC was found at either the genome or expression level at 10m. For DNA the abundance is increasing with depth until 150m, where it is maximal. Then the abundance abruptly drops at 165m, and then is relatively high again at 200m. For RNA, norC abundance is relatively low up to depth 135m, only slowly increasing. Then the abundance is notably greater at 150m, and maximum RNA abundance is reached at 165m. Then RNA abundance decreases again at 200m.
The graph shows that the abundance of DNA differs from the abundance of RNA. DNA abundance is notably higher at depths up to depth 150m. DNA abundance is maximal at 150m, where RNA abundance is lower, and RNA abundance is maximal at 165m, where DNA abundance is quite low. We see about the same amount of norC RNA and DNA at 200m.
norC.all %>%
# Change NAs to "unclassified"
mutate(Family = ifelse(is.na(Family), "unclassified", Family)) %>%
# Remove 0 abundance
mutate(Abundance = ifelse(Abundance == 0, NA, Abundance)) %>%
# Plot the data
ggplot(aes(x = Family, y = Depth_m)) +
# Keep points from overlapping
geom_point(aes(size = Abundance, color = Type), position = position_dodge(0.6)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "Family") +
theme(axis.text.x = element_text(angle = 20, hjust = 1), plot.margin= margin(0,0,0,20))
Figure 2 RPKM abundance of norC stratified at the family level. Abundance is shown at both the genome and expression level across different depths. The size of each circle is proportional to abundance.
In total seven fully classified families were found to contribute to the norC DNA and RNA levels, along with some unclassified Thiotrichales and fully unclassified organisms (Figure 2). All classified families that contribute some amount to the DNA levels, were found to also contribute a nonzero amount to the RNA levels.
A different number of classified families was found to contribute to the DNA and RNA levels at different depths. The smallest number of classified families, only three, contribute at depths of 100m, 120m, and 165m, while the biggest number of classified families, six, contribute at a depth of 150m. Some families were found to contribute to norC levels only at one depth, while others contribute at more depths.
At every depth, the family Candidatus Thioglobus was found to contribute the most to norC DNA abundance, while the second biggest contributors were unclassified organisms. The biggest contributors to RNA abundance at the majority of depths were found to be unclassified organisms. This is while at depth 200m Helicobacteraceae contribute the most to RNA abundance.
norC.all %>%
# Change NAs to "unclassified"
mutate(Class = ifelse(is.na(Class), "unclassified", Class)) %>%
# Remove 0 abundance
mutate(Abundance = ifelse(Abundance == 0, NA, Abundance)) %>%
ggplot(aes(x = Class, y = Depth_m)) +
# Keep points from overlapping
geom_point(aes(size = Abundance, color = Type), position = position_dodge(0.6)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "Class") +
theme(axis.text.x = element_text(angle = 20, hjust = 1))
Figure 3 RPKM abundance of norC stratified at the class level. Abundance is shown at both the genome and expression level across different depths. The size of each circle is proportional to abundance.
Viewed at the class level, it can be seen that the unclassified organisms responsible for most of the RNA are of the class Gammaproteobacteria (Figure 3).
These patterns can also be seen by visualizing the data on iTOL (sample in Appendix Figure A1). The expression of norC is observed to be localized to a few classes. Classes with the highest levels of norC expression are variable with respect to depth. Epsilonproteobacteria and Ardenticatenia were observed to have greater expression at 135m or shallower, while Gammaproteobacteria expression is more biased towards greater depths.
# Save plot of abundance data
plot_abd_depth = ggplot(gene_abd_depth, aes(x = Type, y = Depth_m)) +
geom_point(aes(size = Abundance)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "", y = NULL) +
theme(plot.margin = unit(c(0,0,0.55,0), "cm"))
# Order the data by depth
metadata_depth = metadata %>% arrange(Depth_m)
# Example plot code
# Save plot for NO3 metadata as it changes with depth
plot_NO3 = metadata_depth %>%
ggplot(aes(x = NO3_uM, y = Depth_m)) +
geom_point() +
geom_path(aes(group = 1)) +
scale_y_reverse(lim=c(200,10)) +
labs(y = "Depth (m)", x = "NO3 (uM)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.margin = unit(c(0,0.1,0.3,0), "cm"))
# Create composite figure
plot_grid(plot_N2O, plot_NH4, plot_NO2, plot_NO3, plot_O2, plot_abd_depth,
labels=c("A", "B", "C", "D", "E", "F"),
nrow = 1,
rel_widths=c(1.2,1,1,1,1,2.4))
Figure 4 Nitrogen species and oxygen concentrations across depth (A-E) compared to norC abundance at both the genome and expression level (F; reproduced from Figure 1)
The relation between norC abundance and nitrogen species and oxygen was examined (Figure 4). NO3- and N2O concentrations are seen to peak at 100m, where some norC DNA is observed, but not much norC RNA. NO3- and N2O then decrease with greater depth, as DNA and RNA levels increase. NO2- is relatively constant, between about 0.05uM to 0.10uM, at depths 100m to 165m, where norC DNA and RNA levels increase. This is while NH4+ increases in concentration with depth, starting at a depth of 150m, seemingly increasing with RNA levels. There was lower oxygen concentration where there was a higher RNA abundance. These possible relationships were further examined with linear models.
# Combine gene abundance data with metadata
gene_abd_depth_meta = metadata %>%
# Change N2O to also be in uM like the others
mutate(N2O_uM = N2O_nM / 1000) %>%
# Select columns of interest
select(Depth_m, NH4_uM, N2O_uM, NO2_uM, NO3_uM, O2_uM) %>%
# Rename the columns
rename("NH4 (uM)" = NH4_uM) %>%
rename("N2O (uM)" = N2O_uM) %>%
rename("NO2 (uM)" = NO2_uM) %>%
rename("NO3 (uM)" = NO3_uM) %>%
rename("O2 (uM)" = O2_uM) %>%
# Join the abundance data and metadata
gather(Nutrient,uM,-Depth_m) %>%
full_join(gene_abd_depth, by = "Depth_m") %>%
unite(Type_Nutrient, Type, Nutrient, sep = " - ") %>%
arrange(desc(Type_Nutrient))
# Plot the data
ggplot(gene_abd_depth_meta, aes(x = uM, y=Abundance)) +
geom_point() +
facet_wrap(~Type_Nutrient, scales="free", ncol = 5) +
geom_smooth(method='lm') +
labs(y = "Abundance", x = "Concentration (uM)") +
theme(text = element_text(size=18))
Figure 5 RPKM abundance of norC at different nitrogen species concentrations. Abundance is shown at both the genome and expression level.
The presence and expression of the norC gene was only positively associated with NH4+ levels, while all other correlations revealed negative trends.
# Initialize a data frame
RNA_abundance = data.frame(matrix(ncol = 5, nrow= 0))
for (type_nutrient in unique(gene_abd_depth_meta$Type_Nutrient)) {
# Get the summary of lm()
summ = gene_abd_depth_meta %>%
filter(Type_Nutrient == type_nutrient) %>%
lm(Abundance ~ uM, .) %>%
summary()
# Pull out the statistics values
coef = summ$coefficients["uM",]
# Convert into a data frame
coef_data_frame = data.frame(coef=matrix(coef),row.names=names(coef))
# Add to data frame for table
RNA_abundance[nrow(RNA_abundance) + 1,] <-c(type_nutrient, round(coef_data_frame[,1], 3))
}
# Reverse order to agree with graphs
RNA_abundance = RNA_abundance[nrow(RNA_abundance):1,]
# Name the columns
colnames(RNA_abundance) <- (c("Correlation", "Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(RNA_abundance,row.names=FALSE,caption="Table 1 Correlation of *norC* Abundance across Nitrogen Species and Oxygen Concentrations.")
| Correlation | Estimate | Std. Error | t value | Pr(>|t|) (p-value) |
|---|---|---|---|---|
| DNA - N2O (uM) | -606.402 | 9418.328 | -0.064 | 0.951 |
| DNA - NH4 (uM) | 25.255 | 21.004 | 1.202 | 0.283 |
| DNA - NO2 (uM) | -2031.095 | 1383.87 | -1.468 | 0.202 |
| DNA - NO3 (uM) | -2.491 | 6.099 | -0.408 | 0.7 |
| DNA - O2 (uM) | -1.318 | 0.644 | -2.046 | 0.096 |
| RNA - N2O (uM) | -11946.518 | 9699.499 | -1.232 | 0.273 |
| RNA - NH4 (uM) | 56.903 | 11.746 | 4.844 | 0.005 |
| RNA - NO2 (uM) | -2268.704 | 1659.945 | -1.367 | 0.23 |
| RNA - NO3 (uM) | -11.349 | 5.228 | -2.171 | 0.082 |
| RNA - O2 (uM) | -1.27 | 0.855 | -1.485 | 0.198 |
NorC expression was found to significantly positively correlate with NH4+ levels (p = 0.005). No other significant correlations were observed at either the genome or the expression level.
Expression level significances (ranging p = 0.005-0.273) were generally higher than those of the genome level (ranging p = 0.202-0.951).
While we only investigated a single component of the denitrification pathway, norCâs results can be related to the complex nitrogen metabolic network. At the shallow depth of 10m, norC gene copies and transcripts are not found at detectable levels (Figure 1), which suggests that the complete denitrification pathway is not active. There could be multiple reasons why the denitrification pathway is not active in surface waters. For one, there is ample oxygen and sunlight present for phototrophs to produce energy without requiring the less-efficient denitrification pathway [15, 16]. Second, nitrate is depleted at this depth (Figure 4) and any free nitrogen is likely required for primary production and assimilatory mechanisms to support amino acid and nucleotide synthesis [17].
Aerobic denitrification has been found in harsh environments where the oxygen concentration fluctuates. It is the main source of atmospheric N2O which has been rising due to the uncontrolled use of fertilizer in agriculture and the intensification of OMZs globally [18]. The trends in NO3â, NO2- and NH4+ levels across depths in Saanich Inlet (Figure 4) are consistent with literature in which heterotrophic microbes were found to utilize other oxidants, such as NO3- and NO2-, in anoxic environments [19]. This shift in metabolic activity from oxic to anoxic waters results in NO3- and NO2- being consumed and depleted in anoxic environments as well as NH4+ being produced. While denitrifying microbes produce NO and N2O, other heterotrophic microbes in anoxic environments reduce NO3- and NO2- to NH4+ , consistent with the observation of high NH4+ concentrations in the anoxic layer (Figure 4) [20]. The relationship between NH4+ levels and norC expression is shown to be positive and significant (Table 1).
Interestingly, the concentration of N2O peaks at 100m and gradually decreases, reaching a minimum at 200m (Figure 4). This could be seen as paradoxical since the abundance of norC gene copies and transcripts is increasing with depth and they are responsible for the production of N2O [20], so the concentration of N2O would be expected to increase with depth. However, N2O production is dependent on the presence of other genes involved in the denitrification pathway such as nar, nir, nor, and nis gene clusters [10]. Additionally, N2O production is also dependent on NO3- levels which follow a trend similar to N2O concentrations (Figure 4). Thus, it is tempting to argue that the highest N2O concentrations are found at ~100m because of the dependency of the denitrification pathway on NO3- as a substrate. However, NO3- should not be present where higher rates of denitrification are present since it is consumed by this process and denitrification rates have been shown to be greater at depths where oxygen is depleted [21]. Studies have also shown that N2O is consumed by microbes during the last step in canonical denitrification, which occurs in strictly anoxic environment [22]. This suggests that the the low N2O levels within the anoxic layer do not represent a low rate of N2O production, but instead the production of N2O and its subsequent consumption by the microbial community. This last step of the canonical denitrification pathway in which N2O is being consumed is done through the nitrous oxide reductase gene (nosZ) and therefore it would be interesting to investigate how active this gene is at depth in the inlet.
Many of the norC-containing species we identified are known to be important in the nitrogen cycle. Among the identified classes, Gammaproteobacteria contributed the most to norC DNA and RNA abundance across the majority of depths (Figure 3). This comes as no surprise because Gammaproteobacteria are known to be common marine denitrifiers [23, 24]. Now focusing our analysis at the family taxonomic level, we found that Candidatus Thioglobus contributed the most to norC DNA abundance (Figure 2), which is consistent with literature as the members of that family have been found to produce nitrate, while they are also abundant in OMZs [25]. Interestingly, the family Helicobacteraceae has been found to co-occur with denitrification on elemental sulfur particles [26], which our results support because we only observed high norC expression in Helicobacteraceae at a depth of 200m where sulphur is expected to be highest (Figure 2) [3]. Elucidating which clades contribute to the denitrification pathway by expressing norC is important because it completes one piece of the puzzle that is the nitrogen distributed metabolic network.
Distributed metabolism as demonstrated by the nitrogen cycle is beneficial for the overall survival of the pathway. For example, nitrogenase genes are spread out among different microbes in aerobic and anaerobic environments [27]. This broad distribution of nitrogen-fixing genes reflects the different environments these microbes inhabit [27]. Since the nitrogen cycle is vital for the survival of many plants and animals, a greater distribution of important genes and components for the cycle increases the likelihood of gene persistence through environmental adaptation. Without norC, the denitrification pathway would be incomplete, impacting the entire nitrogen cycle. Denitrification is important for converting nitrate to nitrogen gas by removing bioavailable nitrogen and returning it to the atmosphere [27]. Without this step, nitrogen would remain fixed as the nitrate state in the ecosystem [27]. In terrestrial ecosystems, adding nitrogen can cause nutrient imbalance in trees and change the forest health, eventually leading to a decline in biodiversity [27]. With recent developments in agriculture, much of the nitrogen used in urban and agricultural activities eventually leaches out of the soil and into bodies of water like rivers and streams [27]. In nearshore marine systems, excess nitrogen can lead to eutrophication [27]. To avoid such catastrophic changes to the environment, having vital nitrogen genes distributed among different microbial species prevents accumulation of nitrogen in the anoxic zones of lakes and oceans. Species that cannot adapt to extreme environmental pressures, whether occurring naturally or through human industrial alterations, will not cause the breakdown of the nitrogen cycle. Other microbes that have the capacity to survive harsh environments and have the corresponding genes, will take their place within the denitrification step, ensuring a seamless continuation of the major transformations within the cycle.
When we consider the evolutionary benefits for having a distributed metabolism among different microbes, one potential benefit may be that by doing so, microbes can make their genomes smaller. Having a smaller genome can provide a selective advantage. Giovannoni studied how B vitamins are distributed in complex patterns [28]. An important point that he brings up is that survival of the different plankton species all depends on vitamins that are made within the community - âa cell that needs a vitamin can only thrive in a community that provides itâ [28]. Also, connectedness has become an important concept because it can help explain the response of biological communities to stresses [28]. Giovannoni observes that in fluctuating ocean conditions, the fluxes of reduced sulfur and glycine are sufficient to meet a certain speciesâ requirement most of the time [28]. By relying on the natural sulfur and glycine, the microbes no longer need the genes to reduce the sulfur or glycine themselves, thereby reducing the energy expended by the microbes. This suggests that the fitness advantages of smaller genomes and lower cellular costs offset the potential gain in fitness that would come from producing the required compounds when they are in short supply in the environment [28]. This is not a direct correlation of the nitrogen cycle, but a concept can be extracted from this example. By distributing genes for the nitrogen cycle, with smaller genomes microbes have a selective advantage for survival.
Future studies should examine the potential presence of metabolic interactions between genera expressing norC gene. These possible interactions may impact and alter the abundance and the role of these genera within the community at a specific depth. Although our study focused on one specific gene, the methods could be used to examine the abundance of other genes to further gain insight regarding other biogeochemical processes through microbial metabolism. So one could further examine the genomes of the organisms present in the sample to determine whether there are any other distributed metabolisms among the species. Perhaps other pathways would be discovered, where the bacteria are interdependent on one another, each providing substrates for the other.
[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.
[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.
[3] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4:170159.
[4] Oxygen Minimum Zones. Keil Lab Aquatic Organic Geochemistry UW Oceanography. http://depts.washington.edu/aog/oxygen-minimum-zones/
[5] OMZ Microbes - A SCOR working group. About OMZs. http://omz.microbiology.ubc.ca/page2/index.html
[6] Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earthâs Biogeochemical Cycles. Science 320:1034-1039.
[7] Knowles, R. 1982. Denitrification. Microbiological Reviews 46:43-70.
[8] Bergaust L, Spanning RJM, Frostegard A, Bakken LR. 2010. Expression of nitrous oxide reductase in Paracoccus denitfiricans is regulated by oxygen and nitric oxide through FnrP and NNR. Microbiology 158:826-834.
[9] Altabet MA, Ryabenko E, Stramma L, Wallace DWR, Frank M, Grasse P, Lavik G. 2012. An eddy-stimulated hotspot for fixed nitrogen-loss from the Peru oxygen minimum zone. Biogeosciences 9:4897-4908.
[10] Simon J, Klotz MG. 2013. Diversity and evolution of bioenergetic systems involved in microbial nitrogen compound transformations. Biochimica et Biophysica Acta (BBA) - Bioenergetics 1827:114â“135.
[11] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, Rio TGD, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4:170160.
[12] Letunic I., Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007;23:127â“128.
[13] R Core Team. 2017. R: A language and environment for statistical computing. 3.4.3. R Foundation for Statistical Computing, Vienna, Austria.
[14] Stats. (n.d.). R Documentation. Retrieved April 04, 2018, from https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/lm
[15] Zehr JP, Kudela RM. 2009. Photosynthesis in the Open Ocean. Science 326:945â“946.
[16] Koike I, Hattori A. 1975. Energy Yield of Denitrification: An Estimate from Growth Yield in Continuous Cultures of Pseudomonas denitrificans under Nitrate-, Nitrite- and Nitrous Oxide-limited Conditions. Journal of General Microbiology 88:11â“19.
[17] Wawrik B, Boling WB, Nostrand JDV, Xie J, Zhou J, Bronk DA. 2011. Assimilatory nitrate utilization by bacteria on the West Florida Shelf as determined by stable isotope probing and functional microarray analysis. FEMS Microbiology Ecology 79:400â“411.
[18] Ji B, Yang K, Zhu L, Jiang Y, Wang H, Zhou J, Zhang H. 2015. Aerobic denitrification: A review of important advances of the last 30 years. Biotechnology and Bioprocess Engineering 20:643â“651.
[19] Altabet MA, Ryabenko E, Stramma L, Wallace DWR, Frank M, Grasse P, Lavik G. 2012. An eddy-stimulated hotspot for fixed nitrogen-loss from the Peru oxygen minimum zone. Biogeosciences 9:4897-4908.
[20] MartÃnez-Espinosa RM, Cole JA, Richardson DJ, Watmough NJ. 2011. Enzymology and ecology of the nitrogen cycle. Biochemical Society Transactions 39:175â“178.
[21] Bergaust L, Spanning RJM, Frostegard A, Bakken LR. 2010. Expression of nitrous oxide reductase in Paracoccus denitfiricans is regulated by oxygen and nitric oxide through FnrP and NNR. Microbiology 158:826-834.
[22] Sun X, Jayakumar A, Ward BB. 2017. Community Composition of Nitrous Oxide Consuming Bacteria in the Oxygen Minimum Zone of the Eastern Tropical South Pacific. Frontiers in Microbiology 8.
[23] Brettar I, Moore E, Höfle M. 2001. Phylogeny and Abundance of Novel Denitrifying Bacteria Isolated from the Water Column of the Central Baltic Sea. Microbial Ecology 42:295â“305.
[24] Franco DC, Signori CN, Duarte RTD, Nakayama CR, Campos LS, Pellizari VH. 2017. High Prevalence of Gammaproteobacteria in the Sediments of Admiralty Bay and North Bransfield Basin, Northwestern Antarctic Peninsula. Frontiers in Microbiology 08.
[25] Shah V, Chang BX, Morris RM. 2016. Cultivation of a chemoautotroph from the SUP05 clade of marine bacteria that produces nitrite and consumes ammonium. The ISME Journal 11:263â“271.
[26] Kostrytsia A, Papirio S, Frunzo L, Mattei MR, Porca E, Collins G, Lens PN, Esposito G. 2018. Elemental sulfur-based autotrophic denitrification and denitritation: microbially catalyzed sulfur hydrolysis and nitrogen conversions. Journal of Environmental Management 211:313â“322.
[27] Bernhard, A. 2010. The Nitrogen Cycle: Processes, Players, and Human Impact. Nature Education Knowledge 3:25.
[28] Giovannoni SJ. 2012. Vitamins in the sea. Proceedings of the National Academy of Sciences 109:13888â“13889.
Figure A1 FPKM RNA abundance (coloured bars from purple to red) of norC expression across sample depth with respect to taxonomic assignments. Selected clades are numbered (1: Gammaproteobacteria; 2: Epsilonproteobacteria; 3: Ardenticatenia; 4: Clostridia; 5: Negativicutes). FPKM bar heights are relative within the same colour only. NorC expression is not observed in species outside of what is shown.